Part II: Architectures

Part II at a Glance

Central question: How do architectural choices made before training begins determine what a model can learn about biological sequences?

Prerequisites: Part I (genomic data context). For deep learning background, see Appendix A.

Chapter	Topic	Key Concepts
5 Tokens and Embeddings	Sequence Representations	One-hot, k-mers, BPE, learned embeddings, position encodings
6 Convolutional Networks	Convolutional Networks	Motif detection, regulatory prediction, receptive field limitations
7 Transformers and Attention	Attention & Transformers	Self-attention, position encodings, long-range dependencies

After completing Part II, you will understand:

How tokenization and representation choices shape what models can learn
Why CNNs revolutionized genomic deep learning and where they hit limits
How attention mechanisms enable long-range dependency modeling

Every neural network architecture encodes assumptions about biology. Convolutional networks assume that local patterns matter and that the same motifs are meaningful regardless of genomic position. Attention mechanisms assume that distant positions can interact directly without passing information through intermediate representations. These assumptions, embedded in architectural choices made before any training begins, determine which biological phenomena the model can capture and which remain invisible to it.

Tokenization choices (5 Tokens and Embeddings) propagate through model design, from one-hot encoding through byte-pair encoding to biologically informed vocabularies. Convolutional neural networks (6 Convolutional Networks) first demonstrated that deep learning could outperform handcrafted features for regulatory genomics by learning sequence-to-function mappings directly from data. Self-attention mechanisms and transformer architecture (7 Transformers and Attention) enable both local pattern recognition and long-range dependency modeling across genomic sequences.

Connections to Other Parts

Part I provides the data context that makes architectural choices meaningful
Part III builds on these architectures with pretraining, transfer learning, and evaluation
Part IV applies these principles to specific foundation model families
Part V extends architectures to cellular context and systems-scale modeling