Part II: Architectures
Central question: How do architectural choices made before training begins determine what a model can learn about biological sequences?
Prerequisites: Part I (genomic data context). For deep learning background, see Appendix A.
| Chapter | Topic | Key Concepts |
|---|---|---|
| 5 Tokens and Embeddings | Sequence Representations | One-hot, k-mers, BPE, learned embeddings, position encodings |
| 6 Convolutional Networks | Convolutional Networks | Motif detection, regulatory prediction, receptive field limitations |
| 7 Transformers and Attention | Attention & Transformers | Self-attention, position encodings, long-range dependencies |
After completing Part II, you will understand:
- How tokenization and representation choices shape what models can learn
- Why CNNs revolutionized genomic deep learning and where they hit limits
- How attention mechanisms enable long-range dependency modeling
Every neural network architecture encodes assumptions about biology. Convolutional networks assume that local patterns matter and that the same motifs are meaningful regardless of genomic position. Attention mechanisms assume that distant positions can interact directly without passing information through intermediate representations. These assumptions, embedded in architectural choices made before any training begins, determine which biological phenomena the model can capture and which remain invisible to it.
Tokenization choices (5 Tokens and Embeddings) propagate through model design, from one-hot encoding through byte-pair encoding to biologically informed vocabularies. Convolutional neural networks (6 Convolutional Networks) first demonstrated that deep learning could outperform handcrafted features for regulatory genomics by learning sequence-to-function mappings directly from data. Self-attention mechanisms and transformer architecture (7 Transformers and Attention) enable both local pattern recognition and long-range dependency modeling across genomic sequences.
- Part I provides the data context that makes architectural choices meaningful
- Part III builds on these architectures with pretraining, transfer learning, and evaluation
- Part IV applies these principles to specific foundation model families
- Part V extends architectures to cellular context and systems-scale modeling