Part III: Learning & Evaluation

Part III at a Glance

Central question: How do pretraining objectives, adaptation strategies, and evaluation practices determine whether models learn genuine biology or exploit shortcuts?

Prerequisites: Part II (architectural foundations). Understanding of train/test splits and basic ML metrics.

Chapter	Topic	Key Concepts
8 Pretraining Strategies	Pretraining Objectives	MLM, next-token prediction, contrastive learning, multi-task
9 Transfer Learning Foundations	Transfer Learning	Fine-tuning, domain adaptation, few-shot learning
10 Adaptation Strategies	Model Adaptation	Parameter-efficient fine-tuning, LoRA, prompt tuning
11 Benchmark Landscape	Benchmark Landscape	Benchmark suites, task taxonomy, saturation, staleness
12 Evaluation Methods	Evaluation Methods	Splitting strategies, metrics, baselines, statistical rigor
13 Confounding and Data Leakage	Confounding & Leakage	Data leakage, batch effects, population stratification

After completing Part III, you will understand:

What pretraining objectives teach models about biological sequence
How to transfer learned representations to new tasks effectively
When parameter-efficient methods outperform full fine-tuning
What benchmarks measure and what they miss
How to evaluate models rigorously and avoid common pitfalls
How confounding can inflate performance and how to detect it

Self-supervised objectives shape what models learn from unlabeled sequence (8 Pretraining Strategies). Masked language modeling, next-token prediction, and denoising approaches each encourage models to discover different biological patterns and produce representations with distinct properties. Adapting pretrained models to downstream tasks (9 Transfer Learning Foundations) through fine-tuning, few-shot learning, and domain adaptation strategies completes the path from raw sequence to useful prediction. Parameter-efficient adaptation methods (10 Adaptation Strategies) enable practical fine-tuning when computational resources or labeled data are limited.

Rigorous evaluation requires understanding both what benchmarks measure (11 Benchmark Landscape) and how to apply them properly (12 Evaluation Methods). The benchmark landscape spans protein, DNA, regulatory, and clinical domains, each with distinct validity concerns. Evaluation methodology determines whether benchmark success predicts deployment success: proper splitting strategies, metric selection, baseline comparisons, and statistical rigor distinguish genuine advances from artifacts. Confounding (13 Confounding and Data Leakage) can inflate apparent performance through data leakage, batch effects, and population stratification. Mastering these evaluation principles here enables critical assessment of the foundation models surveyed in Part IV and beyond.

Connections to Other Parts

Part II provides the architectural foundations that pretraining builds upon
Part IV applies these learning principles to specific foundation model families
Part V extends architectures to cellular context and systems-scale modeling
Part VI deepens the evaluation framework with uncertainty, interpretability, and causal reasoning