Part IV: Foundation Model Families
Central question: What are the major foundation model families for genomics, and how do their different assumptions determine where each excels?
Prerequisites: Parts II-III (sequence architectures, pretraining, transfer learning, evaluation)
| Chapter | Topic | Key Models |
|---|---|---|
| 14 Foundation Model Paradigm | FM Principles & Taxonomy | Scaling laws, emergence, foundation model definition |
| 15 DNA Language Models | DNA Language Models | DNABERT, Nucleotide Transformer, HyenaDNA, Evo |
| 16 Protein Language Models | Protein Language Models | ESM, ProtTrans, ESMFold, AlphaFold2 |
| 17 Regulatory Models | Regulatory Models | Enformer, Borzoi, AlphaGenome |
| 18 Variant Effect Prediction | Variant Effect Prediction | AlphaMissense, SpliceAI, integrated VEP |
After completing Part IV, you will understand:
- What distinguishes foundation models from earlier supervised approaches
- How DNA language models learn regulatory grammar from sequence
- Why protein language models achieved such dramatic success
- How hybrid architectures enable 200kb+ context windows
- How different approaches combine for comprehensive variant effect prediction
Each architecture embodies a different set of assumptions about biological sequence. Convolutional models assume that local motifs and their short-range combinations are the primary carriers of regulatory information; they learn to recognize transcription factor binding sites, splice signals, and chromatin accessibility patterns from the sequence grammar immediately surrounding each position. Protein language models treat amino acid sequences as structured compositions whose meaning emerges from evolutionary context; they learn what substitutions are tolerated by observing which sequences survived natural selection. DNA language models extend this paradigm to nucleotides, learning regulatory grammar through self-supervised objectives that predict masked or next tokens. Hybrid architectures attempt to reconcile local and global perspectives, using convolutions to extract features efficiently while deploying attention to model interactions spanning tens or hundreds of kilobases. Understanding these assumptions clarifies what each model family can capture and where each will fail.
Foundation models in genomic deep learning span distinct architectural families, each with characteristic strengths. Foundational principles and taxonomy (14 Foundation Model Paradigm) establish what defines a foundation model and provide a framework for navigating the rapidly expanding ecosystem. DNA language models (15 DNA Language Models), including DNABERT, Nucleotide Transformer, and HyenaDNA, apply self-supervised pretraining to genomic sequence, learning representations that transfer across diverse downstream tasks. Protein language models (16 Protein Language Models) achieved the earliest and most dramatic foundation model successes; ESM, ProtTrans, and their descendants emerged alongside AlphaFold2 in 2020, collectively demonstrating that deep learning could capture protein structure and function from sequence alone. AlphaFold2 revolutionized structure prediction through its Evoformer architecture, and AlphaMissense subsequently adapted that architecture for proteome-wide variant effect prediction. Hybrid architectures (17 Regulatory Models), including Enformer, Borzoi, and AlphaGenome, combine convolutional processing with transformer blocks to achieve context windows spanning hundreds of kilobases, enabling direct prediction of gene expression from sequence. Variant effect prediction (18 Variant Effect Prediction) synthesizes these approaches, translating foundation model representations into pathogenicity scores across variant types and genomic contexts.
- Part II-III provide the architectural foundations (CNNs, attention, pretraining) and evaluation methodology these models build upon
- Part V extends foundation model principles to RNA, single-cell, 3D genome, and multi-omics
- Part VI provides tools to evaluate what these models actually learn versus what they claim
- Part VII deploys these models in clinical and translational contexts