This appendix collects educational resources, databases, and software tools for readers seeking to deepen their understanding of genomics, machine learning, and their intersection. Resources are organized by topic and include both foundational references and practical tools.
Textbooks
Genomics and Human Genetics
- Thompson & Thompson Genetics and Genomics in Medicine (9th ed.)
-
Ronald Cohn, Stephen Scherer, Ada Hamosh. Clinical-focused overview of human genetics and genomics for medicine. Excellent grounding in clinical genomics, variant interpretation, and genetic disease mechanisms.
- Human Molecular Genetics (5th ed.)
-
Tom Strachan, Andrew Read. Higher-level molecular genetics text with strong coverage of mechanisms, technologies, and disease applications. More technical depth than Thompson & Thompson.
- Molecular Biology of the Cell (7th ed.)
-
Bruce Alberts et al. Comprehensive cell biology text covering the molecular machinery underlying genomic processes. Essential background for understanding what genomic models are predicting.
- Genomes 4
-
T.A. Brown. Focused specifically on genome organization, evolution, and analysis. Strong coverage of comparative genomics relevant to conservation-based methods.
Immunology
- Janeway’s Immunobiology (10th ed.)
-
Kenneth M. Murphy, Casey Weaver, Leslie J. Berg. Standard comprehensive immunology textbook. Relevant for understanding immune-related genomic variation and applications like HLA typing.
Machine Learning and Deep Learning
- Deep Learning
-
Ian Goodfellow, Yoshua Bengio, Aaron Courville. The comprehensive deep learning reference. Free online: https://www.deeplearningbook.org/
- Dive into Deep Learning (D2L)
-
Aston Zhang et al. Interactive deep learning book with executable Jupyter notebooks and multi-framework code (PyTorch, TensorFlow, JAX). Free online: https://d2l.ai/
- An Introduction to Statistical Learning (ISLR, 2nd ed.)
-
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. Gentle introduction to statistical learning methods. R and Python editions available free online: https://www.statlearning.com/
- The Elements of Statistical Learning (ESL)
-
Trevor Hastie, Robert Tibshirani, Jerome Friedman. More advanced, theory-heavy companion to ISLR. Free PDF: https://hastie.su.domains/ElemStatLearn/
- Pattern Recognition and Machine Learning
-
Christopher Bishop. Classic ML text with strong probabilistic foundations. Relevant for understanding uncertainty quantification approaches.
Foundation Model Reference Library
The following Springer textbooks provide systematic coverage of topics essential for genomic foundation model research. Organized by domain, these represent authoritative references for deep learning architectures, clinical prediction methodology, statistical genetics, and interpretability.
Deep Learning and Foundation Models
- Foundation Models for Natural Language Processing: Pre-trained Language Models Integrating Media
-
Gerhard Paass, Sven Giesselbach. Springer, 2023. Open Access (CC-BY 4.0). Comprehensive coverage of transformer architectures, pretraining objectives, and transfer learning. Essential for understanding the architectural foundations underlying genomic language models.
- Multivariate Statistical Machine Learning Methods for Genomic Prediction
-
Osval Antonio Montesinos Lopez, Abelardo Montesinos Lopez, Jose Crossa. Springer, 2022. Open Access (CC-BY 4.0). Covers statistical and deep learning methods for genomic prediction including G-BLUP, Bayesian methods, kernel methods, and neural network implementations with code examples.
- Machine Learning and Systems Biology in Genomics and Health
-
Shailza Singh (ed.). Springer, 2022. Applied machine learning for disease prediction, gene regulatory networks, and cardiovascular genomics.
Clinical Prediction and Validation
- Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating (2nd ed.)
-
Ewout W. Steyerberg. Springer, 2019. The gold standard reference for clinical prediction model development. Covers discrimination, calibration, validation strategies, net benefit analysis, and TRIPOD reporting guidelines. Essential reading for any clinical deployment work.
Interpretability and Explainable AI
- Interpretability in Deep Learning
-
Ayush Somani, Alexander Horsch, Dilip K. Prasad. Springer, 2023. Comprehensive taxonomy of interpretability methods including the 5W1H framework, saliency methods, attention visualization, and domain-specific applications to CNNs, autoencoders, and graph neural networks.
- Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (LNAI 11700)
-
Wojciech Samek, Gregoire Montavon, Andrea Vedaldi, Lars Kai Hansen, Klaus-Robert Muller (eds.). Springer, 2019. Multi-author volume covering feature visualization, layer-wise relevance propagation, and methods for evaluating explanation quality.
- xxAI - Beyond Explainable AI (LNAI 13200)
-
Andreas Holzinger, Randy Goebel, Ruth Fong, Taesup Moon, Klaus-Robert Muller, Wojciech Samek (eds.). Springer, 2022. Advances beyond basic XAI including concept-based explanations, counterfactual analysis, and causal approaches to interpretation.
Statistical Genetics
- The Fundamentals of Modern Statistical Genetics
-
Nan M. Laird, Christoph Lange. Springer, 2011. Foundational text covering Mendelian genetics, linkage and association analysis, population structure, and gene-environment interactions. Essential background for understanding confounding in genomic prediction.
- Heterogeneity in Statistical Genetics: How to Assess, Address, and Account for Mixtures in Association Studies
-
Derek Gordon, Stephen J. Finch, Wonkuk Kim. Springer, 2020. Critical reference for understanding population stratification, locus heterogeneity, and statistical methods to address ancestry-related confounding.
- Applied Statistical Genetics with R: For Population-based Association Studies
-
Andrea S. Foulkes. Springer, 2009. Practical R implementations for GWAS analysis, multiple testing correction, haplotype analysis, and tree-based methods for genetic data.
- Statistical Genetics of Quantitative Traits: Linkage, Maps, and QTL
-
Rongling Wu, Chang-Xing Ma, George Casella. Springer, 2007. Classical foundations of QTL mapping and statistical models for quantitative traits.
Systems Biology and Networks
- Networks in Systems Biology: Applications for Disease Modeling (Computational Biology 32)
-
Fabricio Alves Barbosa da Silva, Nicolas Carels, Marcelo Trindade dos Santos, Francisco Jose Pereira Lopes (eds.). Springer, 2020. Covers protein-protein interaction networks, gene regulatory networks, network propagation algorithms, and disease module identification methods.
- Handbook of Statistical Bioinformatics (2nd ed.)
-
Henry Horng-Shing Lu, Bernhard Scholkopf, Martin T. Wells, Hongyu Zhao (eds.). Springer, 2022. Comprehensive handbook covering single-cell analysis methods, network inference, causal discovery, and deep learning for omics data.
- Methodologies of Multi-Omics Data Integration and Data Mining (Translational Bioinformatics 19)
-
Kang Ning (ed.). Springer, 2023. Methods for integrating multiple data modalities including feature-level and decision-level fusion approaches.
Causal Inference
- Statistical Causal Discovery: LiNGAM Approach (SpringerBriefs in Statistics)
-
Shohei Shimizu. Springer, 2022. Specialized treatment of non-Gaussian causal discovery methods with identifiability conditions and applications to observational data.
Online Courses
Machine Learning and Deep Learning
- Stanford CS229: Machine Learning
-
Andrew Ng’s foundational ML course. Lecture videos and materials freely available. https://cs229.stanford.edu/
- Stanford CS231n: CNNs for Visual Recognition
-
Deep dive into convolutional networks with strong foundations applicable to sequence models. http://cs231n.stanford.edu/
- Stanford CS224n: NLP with Deep Learning
-
Essential for understanding transformer architectures, attention mechanisms, and language model pretraining. http://web.stanford.edu/class/cs224n/
- fast.ai Practical Deep Learning
-
Top-down practical approach to deep learning. Free course with notebooks: https://course.fast.ai/
- DeepMind x UCL Deep Learning Lecture Series
-
Excellent coverage of modern deep learning topics including transformers and self-supervised learning. YouTube playlist freely available.
Applied Genomic ML
- Coursera: AI for Medicine Specialization
-
DeepLearning.AI course covering ML applications in medical imaging and clinical data. https://www.coursera.org/specializations/ai-for-medicine
- ML4Bio Summer School
-
Annual workshop on machine learning for biology. Materials often available online.
Genomic Databases
Variant and Population Databases
Functional Annotation Databases
Gene and Pathway Databases
Keeping Current
The field moves rapidly. Strategies for staying current:
Preprint alerts: Set bioRxiv/arXiv alerts for keywords like “genomic foundation model,” “variant effect prediction,” “DNA language model”
Twitter/X: Follow active researchers and labs; the ML4Bio community is particularly active
Conference proceedings: ISMB, RECOMB, and NeurIPS MLCB workshops publish cutting-edge work
Model hubs: Monitor HuggingFace for new genomic model releases
Database updates: ClinVar and gnomAD release notes track data growth and methodology changes
Review articles: Annual reviews in Nature Reviews Genetics, Genome Biology, and Nature Methods provide consolidated perspectives