Part III: Learning & Evaluation
Central question: How do pretraining objectives, adaptation strategies, and evaluation practices determine whether models learn genuine biology or exploit shortcuts?
Prerequisites: Part II (architectural foundations). Understanding of train/test splits and basic ML metrics.
| Chapter | Topic | Key Concepts |
|---|---|---|
| 8 Pretraining Strategies | Pretraining Objectives | MLM, next-token prediction, contrastive learning, multi-task |
| 9 Transfer Learning Foundations | Transfer Learning | Fine-tuning, domain adaptation, few-shot learning |
| 10 Adaptation Strategies | Model Adaptation | Parameter-efficient fine-tuning, LoRA, prompt tuning |
| 11 Benchmark Landscape | Benchmark Landscape | Benchmark suites, task taxonomy, saturation, staleness |
| 12 Evaluation Methods | Evaluation Methods | Splitting strategies, metrics, baselines, statistical rigor |
| 13 Confounding and Data Leakage | Confounding & Leakage | Data leakage, batch effects, population stratification |
After completing Part III, you will understand:
- What pretraining objectives teach models about biological sequence
- How to transfer learned representations to new tasks effectively
- When parameter-efficient methods outperform full fine-tuning
- What benchmarks measure and what they miss
- How to evaluate models rigorously and avoid common pitfalls
- How confounding can inflate performance and how to detect it
Self-supervised objectives shape what models learn from unlabeled sequence (8 Pretraining Strategies). Masked language modeling, next-token prediction, and denoising approaches each encourage models to discover different biological patterns and produce representations with distinct properties. Adapting pretrained models to downstream tasks (9 Transfer Learning Foundations) through fine-tuning, few-shot learning, and domain adaptation strategies completes the path from raw sequence to useful prediction. Parameter-efficient adaptation methods (10 Adaptation Strategies) enable practical fine-tuning when computational resources or labeled data are limited.
Rigorous evaluation requires understanding both what benchmarks measure (11 Benchmark Landscape) and how to apply them properly (12 Evaluation Methods). The benchmark landscape spans protein, DNA, regulatory, and clinical domains, each with distinct validity concerns. Evaluation methodology determines whether benchmark success predicts deployment success: proper splitting strategies, metric selection, baseline comparisons, and statistical rigor distinguish genuine advances from artifacts. Confounding (13 Confounding and Data Leakage) can inflate apparent performance through data leakage, batch effects, and population stratification. Mastering these evaluation principles here enables critical assessment of the foundation models surveyed in Part IV and beyond.
- Part II provides the architectural foundations that pretraining builds upon
- Part IV applies these learning principles to specific foundation model families
- Part V extends architectures to cellular context and systems-scale modeling
- Part VI deepens the evaluation framework with uncertainty, interpretability, and causal reasoning