Foundation models are actively developed, with new versions often substantially outperforming predecessors. When citing or deploying models:
Abramson, Josh, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, et al. 2024.
“[AlphaFold3] Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3.” Nature 630 (8016): 493–500.
https://doi.org/10.1038/s41586-024-07487-w.
Adzhubei, Ivan A., Steffen Schmidt, Leonid Peshkin, Vasily E. Ramensky, Anna Gerasimova, Peer Bork, Alexey S. Kondrashov, and Shamil R. Sunyaev. 2010.
“A Method and Server for Predicting Damaging Missense Mutations.” Nature Methods 7 (4): 248–49.
https://doi.org/10.1038/nmeth0410-248.
Avsec, Žiga, Vikram Agarwal, D. Visentin, J. Ledsam, A. Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, J. Jumper, Pushmeet Kohli, and David R. Kelley. 2021.
“[Enformer] Effective Gene Expression Prediction from Sequence by Integrating Long-Range Interactions.” Nature Methods 18 (October): 1196–1203.
https://doi.org/10.1038/s41592-021-01252-x.
Baek, Minkyung, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, et al. 2021.
“Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network.” Science 373 (6557): 871–76.
https://doi.org/10.1126/science.abj8754.
Benegas, Gonzalo, Carlos Albors, Alan J. Aw, Chengzhong Ye, and Yun S. Song. 2024.
“GPN-MSA: An Alignment-Based DNA Language Model for Genome-Wide Variant Effect Prediction.” bioRxiv, April, 2023.10.10.561776.
https://doi.org/10.1101/2023.10.10.561776.
Brixi, Garyk, Matthew G. Durrant, Jerome Ku, Michael Poli, Greg Brockman, Daniel Chang, Gabriel A. Gonzalez, et al. 2025.
“[Evo 2] Genome Modeling and Design Across All Domains of Life with Evo 2.” bioRxiv.
https://doi.org/10.1101/2025.02.18.638918.
Cao, Zhi-Jie, and Ge Gao. 2022.
“[GLUE] Multi-Omics Single-Cell Data Integration and Regulatory Inference with Graph-Linked Embedding.” Nature Biotechnology 40 (10): 1458–66.
https://doi.org/10.1038/s41587-022-01284-4.
Chen, Kathleen M., Aaron K. Wong, Olga G. Troyanskaya, and Jian Zhou. 2022.
“[DeepSEA Sei] A Sequence-Based Global Map of Regulatory Activity for Deciphering Human Genetics.” Nature Genetics 54 (7): 940–49.
https://doi.org/10.1038/s41588-022-01102-2.
Cheng, Jun, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, et al. 2023.
“[AlphaMissense] Accurate Proteome-Wide Missense Variant Effect Prediction with AlphaMissense.” Science 381 (6664): eadg7492.
https://doi.org/10.1126/science.adg7492.
Clarke, Brian, Eva Holtkamp, Hakime Öztürk, Marcel Mück, Magnus Wahlberg, Kayla Meyer, Felix Munzlinger, et al. 2024.
“[DeepRVAT] Integration of Variant Annotations Using Deep Set Networks Boosts Rare Variant Association Testing.” Nature Genetics 56 (10): 2271–80.
https://doi.org/10.1038/s41588-024-01919-z.
Cui, Haotian, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang. 2024.
“scGPT: Toward Building a Foundation Model for Single-Cell Multi-Omics Using Generative AI.” Nature Methods 21 (8): 1470–80.
https://doi.org/10.1038/s41592-024-02201-0.
Dalla-Torre, Hugo, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, et al. 2023.
“Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics.” Nature Methods 22 (2): 287–97.
https://doi.org/10.1038/s41592-024-02523-z.
Davydov, Eugene V., David L. Goode, Marina Sirota, Gregory M. Cooper, Arend Sidow, and Serafim Batzoglou. 2010.
“Identifying a High Fraction of the Human Genome to Be Under Selective Constraint Using GERP++.” PLOS Computational Biology 6 (12): e1001025.
https://doi.org/10.1371/journal.pcbi.1001025.
Elnaggar, Ahmed, Michael Heinzinger, Christian Dallago, Ghalia Rihawi, Yu Wang, Llion Jones, Tom Gibbs, et al. 2021.
“ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing.” arXiv.
https://doi.org/10.48550/arXiv.2007.06225.
Frazer, Jonathan, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K. Min, Kelly Brock, Yarin Gal, and Debora S. Marks. 2021.
“[EVE] Disease Variant Prediction with Deep Generative Models of Evolutionary Data.” Nature 599 (7883): 91–95.
https://doi.org/10.1038/s41586-021-04043-8.
Georgantas, Costa, Zoltán Kutalik, and Jonas Richiardi. 2024.
“Delphi: A Deep-Learning Method for Polygenic Risk Prediction.” medRxiv.
https://doi.org/10.1101/2024.04.19.24306079.
Ioannidis, Nilah M., Joseph H. Rothstein, Vikas Pejaver, Sumit Middha, Shannon K. McDonnell, Saurabh Baheti, Anthony Musolf, et al. 2016.
“REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.” The American Journal of Human Genetics 99 (4): 877–85.
https://doi.org/10.1016/j.ajhg.2016.08.016.
Jaganathan, Kishore, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F. McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A. Kosmicki, et al. 2019.
“[SpliceAI] Predicting Splicing from Primary Sequence with Deep Learning.” Cell 176 (3): 535–548.e24.
https://doi.org/10.1016/j.cell.2018.12.015.
Ji, Yanrong, Zhihan Zhou, Han Liu, and Ramana V Davuluri. 2021.
“DNABERT: Pre-Trained Bidirectional Encoder Representations from Transformers Model for DNA-Language in Genome.” Bioinformatics 37 (15): 2112–20.
https://doi.org/10.1093/bioinformatics/btab083.
Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021.
“[AlphaFold2] Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.
https://doi.org/10.1038/s41586-021-03819-2.
Kelley, David R. 2020.
“[Basenji2] Cross-Species Regulatory Sequence Activity Prediction.” PLOS Computational Biology 16 (7): e1008050.
https://doi.org/10.1371/journal.pcbi.1008050.
Kelley, David R., Yakir A. Reshef, Maxwell Bileschi, David Belanger, Cory Y. McLean, and Jasper Snoek. 2018.
“[Basenji] Sequential Regulatory Activity Prediction Across Chromosomes with Convolutional Neural Networks.” Genome Research 28 (5): 739–50.
https://doi.org/10.1101/gr.227819.117.
Lee, Ingoo, Zachary S. Wallace, Yuqi Wang, Sungjoon Park, Hojung Nam, Amit R. Majithia, and Trey Ideker. 2025.
“[G2PT] A Genotype-Phenotype Transformer to Assess and Explain Polygenic Risk.” bioRxiv.
https://doi.org/10.1101/2024.10.23.619940.
Lin, Zeming, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, et al. 2022.
“[ESM-2] Language Models of Protein Sequences at the Scale of Evolution Enable Accurate Structure Prediction.” bioRxiv.
https://doi.org/10.1101/2022.07.20.500902.
Linder, Johannes, Divyanshi Srivastava, Han Yuan, Vikram Agarwal, and David R. Kelley. 2025.
“[Borzoi] Predicting RNA-Seq Coverage from DNA Sequence as a Unifying Model of Gene Regulation.” Nature Genetics 57 (4): 949–61.
https://doi.org/10.1038/s41588-024-02053-6.
Meier, Joshua, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alexander Rives. 2021.
“[ESM-1v] Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function.” bioRxiv.
https://doi.org/10.1101/2021.07.09.450648.
Ng, Pauline C., and Steven Henikoff. 2003.
“SIFT: Predicting Amino Acid Changes That Affect Protein Function.” Nucleic Acids Research 31 (13): 3812–14.
https://doi.org/10.1093/nar/gkg509.
Nguyen, Eric, Michael Poli, Matthew G. Durrant, Brian Kang, Dhruva Katrekar, David B. Li, Liam J. Bartie, et al. 2024.
“Sequence Modeling and Design from Molecular to Genome Scale with Evo.” Science 386 (6723): eado9336.
https://doi.org/10.1126/science.ado9336.
Nguyen, Eric, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, et al. 2023.
“HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution.” arXiv.
https://doi.org/10.48550/arXiv.2306.15794.
Nijkamp, Erik, Jeffrey A. Ruffolo, Eli N. Weinstein, Nikhil Naik, and Ali Madani. 2023.
“ProGen2: Exploring the Boundaries of Protein Language Models.” Cell Systems 14 (11): 968–978.e3.
https://doi.org/10.1016/j.cels.2023.10.002.
Pollard, Katherine S., Melissa J. Hubisz, Kate R. Rosenbloom, and Adam Siepel. 2009.
“Detection of Nonneutral Substitution Rates on Mammalian Phylogenies.” Genome Research 20 (1): 110–21.
https://doi.org/10.1101/gr.097857.109.
Rentzsch, Philipp, Daniela Witten, Gregory M Cooper, Jay Shendure, and Martin Kircher. 2019.
“CADD: Predicting the Deleteriousness of Variants Throughout the Human Genome.” Nucleic Acids Research 47 (D1): D886–94.
https://doi.org/10.1093/nar/gky1016.
Sanabria, Melissa, Jonas Hirsch, Pierre M. Joubert, and Anna R. Poetsch. 2024.
“[GROVER] DNA Language Model GROVER Learns Sequence Context in the Human Genome.” Nature Machine Intelligence 6 (8): 911–23.
https://doi.org/10.1038/s42256-024-00872-0.
Schiff, Yair, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and Volodymyr Kuleshov. 2024.
“Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling.” arXiv.
https://doi.org/10.48550/arXiv.2403.03234.
Sundaram, Laksshman, Hong Gao, Samskruthi Reddy Padigepati, Jeremy F. McRae, Yanjun Li, Jack A. Kosmicki, Nondas Fritzilas, et al. 2018.
“Predicting the Clinical Impact of Human Mutation with Deep Neural Networks.” Nature Genetics 50 (8): 1161–70.
https://doi.org/10.1038/s41588-018-0167-z.
Theodoris, Christina V., Ling Xiao, Anant Chopra, Mark D. Chaffin, Zeina R. Al Sayed, Matthew C. Hill, Helene Mantineo, et al. 2023.
“[Geneformer] Transfer Learning Enables Predictions in Network Biology.” Nature 618 (7965): 616–24.
https://doi.org/10.1038/s41586-023-06139-9.
Yang, Fan, Wenchuan Wang, Fang Wang, Yuan Fang, Duyu Tang, Junzhou Huang, Hui Lu, and Jianhua Yao. 2022.
“scBERT as a Large-Scale Pretrained Deep Language Model for Cell Type Annotation of Single-Cell RNA-Seq Data.” Nature Machine Intelligence 4 (10): 852–66.
https://doi.org/10.1038/s42256-022-00534-z.
Yeo, Gene, and Christopher B. Burge. 2004.
“Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals.” Journal of Computational Biology 11 (2-3): 377–94.
https://doi.org/10.1089/1066527041410418.
Zeng, Tony, and Yang I. Li. 2022.
“Predicting RNA Splicing from DNA Sequence Using Pangolin.” Genome Biology 23 (1): 103.
https://doi.org/10.1186/s13059-022-02664-4.
Zhou, Jian, Chandra L. Theesfeld, Kevin Yao, Kathleen M. Chen, Aaron K. Wong, and Olga G. Troyanskaya. 2018.
“[Expecto] Deep Learning Sequence-Based Ab Initio Prediction of Variant Effects on Expression and Disease Risk.” Nature Genetics 50 (8): 1171–79.
https://doi.org/10.1038/s41588-018-0160-6.
Zhou, Jian, and Olga G. Troyanskaya. 2015.
“[DeepSEA] Predicting Effects of Noncoding Variants with Deep Learning–Based Sequence Model.” Nature Methods 12 (10): 931–34.
https://doi.org/10.1038/nmeth.3547.
Zhou, Zhihan, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and Han Liu. 2024.
“DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome.” arXiv.
https://doi.org/10.48550/arXiv.2306.15006.