Part IV: Foundation Model Families

Part IV at a Glance

Central question: What are the major foundation model families for genomics, and how do their different assumptions determine where each excels?

Prerequisites: Parts II-III (sequence architectures, pretraining, transfer learning, evaluation)

Chapter	Topic	Key Models
14 Foundation Model Paradigm	FM Principles & Taxonomy	Scaling laws, emergence, foundation model definition
15 DNA Language Models	DNA Language Models	DNABERT, Nucleotide Transformer, HyenaDNA, Evo
16 Protein Language Models	Protein Language Models	ESM, ProtTrans, ESMFold, AlphaFold2
17 Regulatory Models	Regulatory Models	Enformer, Borzoi, AlphaGenome
18 Variant Effect Prediction	Variant Effect Prediction	AlphaMissense, SpliceAI, integrated VEP

After completing Part IV, you will understand:

What distinguishes foundation models from earlier supervised approaches
How DNA language models learn regulatory grammar from sequence
Why protein language models achieved such dramatic success
How hybrid architectures enable 200kb+ context windows
How different approaches combine for comprehensive variant effect prediction

Each architecture embodies a different set of assumptions about biological sequence. Convolutional models assume that local motifs and their short-range combinations are the primary carriers of regulatory information; they learn to recognize transcription factor binding sites, splice signals, and chromatin accessibility patterns from the sequence grammar immediately surrounding each position. Protein language models treat amino acid sequences as structured compositions whose meaning emerges from evolutionary context; they learn what substitutions are tolerated by observing which sequences survived natural selection. DNA language models extend this paradigm to nucleotides, learning regulatory grammar through self-supervised objectives that predict masked or next tokens. Hybrid architectures attempt to reconcile local and global perspectives, using convolutions to extract features efficiently while deploying attention to model interactions spanning tens or hundreds of kilobases. Understanding these assumptions clarifies what each model family can capture and where each will fail.

Foundation models in genomic deep learning span distinct architectural families, each with characteristic strengths. Foundational principles and taxonomy (14 Foundation Model Paradigm) establish what defines a foundation model and provide a framework for navigating the rapidly expanding ecosystem. DNA language models (15 DNA Language Models), including DNABERT, Nucleotide Transformer, and HyenaDNA, apply self-supervised pretraining to genomic sequence, learning representations that transfer across diverse downstream tasks. Protein language models (16 Protein Language Models) achieved the earliest and most dramatic foundation model successes; ESM, ProtTrans, and their descendants emerged alongside AlphaFold2 in 2020, collectively demonstrating that deep learning could capture protein structure and function from sequence alone. AlphaFold2 revolutionized structure prediction through its Evoformer architecture, and AlphaMissense subsequently adapted that architecture for proteome-wide variant effect prediction. Hybrid architectures (17 Regulatory Models), including Enformer, Borzoi, and AlphaGenome, combine convolutional processing with transformer blocks to achieve context windows spanning hundreds of kilobases, enabling direct prediction of gene expression from sequence. Variant effect prediction (18 Variant Effect Prediction) synthesizes these approaches, translating foundation model representations into pathogenicity scores across variant types and genomic contexts.

Connections to Other Parts

Part II-III provide the architectural foundations (CNNs, attention, pretraining) and evaluation methodology these models build upon
Part V extends foundation model principles to RNA, single-cell, 3D genome, and multi-omics
Part VI provides tools to evaluate what these models actually learn versus what they claim
Part VII deploys these models in clinical and translational contexts