References

Adzhubei, Ivan A., Steffen Schmidt, Leonid Peshkin, Vasily E. Ramensky, Anna Gerasimova, Peer Bork, Alexey S. Kondrashov, and Shamil R. Sunyaev. 2010. “A Method and Server for Predicting Damaging Missense Mutations.” Nature Methods 7 (4): 248–49. https://doi.org/10.1038/nmeth0410-248.
All of Us, All of Us Research Program Investigators. 2019. “The All of Us Research Program.” New England Journal of Medicine 381 (7): 668–76. https://doi.org/10.1056/NEJMsr1809937.
Auton, Adam, Gonçalo R. Abecasis, David M. Altshuler, Richard M. Durbin, Gonçalo R. Abecasis, David R. Bentley, Aravinda Chakravarti, et al. 2015. “A Global Reference for Human Genetic Variation.” Nature 526 (7571): 68–74. https://doi.org/10.1038/nature15393.
Avsec, Žiga, Vikram Agarwal, D. Visentin, J. Ledsam, A. Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, J. Jumper, Pushmeet Kohli, and David R. Kelley. 2021. “[Enformer] Effective Gene Expression Prediction from Sequence by Integrating Long-Range Interactions.” Nature Methods 18 (October): 1196–1203. https://doi.org/10.1038/s41592-021-01252-x.
Avsec, Ziga, Natasha Latysheva, and Jun Cheng. 2025. AlphaGenome: AI for Better Understanding the Genome.” Google DeepMind. https://deepmind.google/discover/blog/alphagenome-ai-for-better-understanding-the-genome/.
Benegas, Gonzalo, Carlos Albors, Alan J. Aw, Chengzhong Ye, and Yun S. Song. 2024. GPN-MSA: An Alignment-Based DNA Language Model for Genome-Wide Variant Effect Prediction.” bioRxiv, April, 2023.10.10.561776. https://doi.org/10.1101/2023.10.10.561776.
Benegas, Gonzalo, Sanjit Singh Batra, and Yun S. Song. 2023. “[GPN] DNA Language Models Are Powerful Predictors of Genome-Wide Variant Effects.” Proceedings of the National Academy of Sciences 120 (44): e2311219120. https://doi.org/10.1073/pnas.2311219120.
Benegas, Gonzalo, Gökcen Eraslan, and Yun S. Song. 2025. “[TraitGym] Benchmarking DNA Sequence Models for Causal Regulatory Variant Prediction in Human Genetics.” bioRxiv. https://doi.org/10.1101/2025.02.11.637758.
Benegas, Gonzalo, Chengzhong Ye, Carlos Albors, Jianan Canal Li, and Yun S. Song. 2024. “Genomic Language Models: Opportunities and Challenges.” arXiv. https://doi.org/10.48550/arXiv.2407.11435.
Bommasani, Rishi, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, et al. 2022. “On the Opportunities and Risks of Foundation Models.” arXiv. https://doi.org/10.48550/arXiv.2108.07258.
Brandes, Nadav, Grant Goldman, Charlotte H. Wang, Chun Jimmie Ye, and Vasilis Ntranos. 2023. “Genome-Wide Prediction of Disease Variant Effects with a Deep Protein Language Model.” Nature Genetics 55 (9): 1512–22. https://doi.org/10.1038/s41588-023-01465-0.
Brixi, Garyk, Matthew G. Durrant, Jerome Ku, Michael Poli, Greg Brockman, Daniel Chang, Gabriel A. Gonzalez, et al. 2025. “[Evo 2] Genome Modeling and Design Across All Domains of Life with Evo 2.” bioRxiv. https://doi.org/10.1101/2025.02.18.638918.
Browning, Brian L., Xiaowen Tian, Ying Zhou, and Sharon R. Browning. 2021. “Fast Two-Stage Phasing of Large-Scale Sequence Data.” American Journal of Human Genetics 108 (10): 1880–90. https://doi.org/10.1016/j.ajhg.2021.08.005.
Bycroft, Clare, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T. Elliott, Kevin Sharp, Allan Motyer, et al. 2018. “The UK Biobank Resource with Deep Phenotyping and Genomic Data.” Nature 562 (7726): 203–9. https://doi.org/10.1038/s41586-018-0579-z.
Camillo, Lucas Paulo de Lima, Raghav Sehgal, Jenel Armstrong, Albert T. Higgins-Chen, Steve Horvath, and Bo Wang. 2024. CpGPT: A Foundation Model for DNA Methylation.” bioRxiv. https://doi.org/10.1101/2024.10.24.619766.
Cao, Zhi-Jie, and Ge Gao. 2022. “[GLUE] Multi-Omics Single-Cell Data Integration and Regulatory Inference with Graph-Linked Embedding.” Nature Biotechnology 40 (10): 1458–66. https://doi.org/10.1038/s41587-022-01284-4.
Chandak, Payal, Kexin Huang, and Marinka Zitnik. 2023. “[PrimeKG] Building a Knowledge Graph to Enable Precision Medicine.” Scientific Data 10 (1): 67. https://doi.org/10.1038/s41597-023-01960-3.
Chen, Kathleen M., Aaron K. Wong, Olga G. Troyanskaya, and Jian Zhou. 2022. “[DeepSEA Sei] A Sequence-Based Global Map of Regulatory Activity for Deciphering Human Genetics.” Nature Genetics 54 (7): 940–49. https://doi.org/10.1038/s41588-022-01102-2.
Cheng, Jun, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, et al. 2023. “[AlphaMissense] Accurate Proteome-Wide Missense Variant Effect Prediction with AlphaMissense.” Science 381 (6664): eadg7492. https://doi.org/10.1126/science.adg7492.
Choi, Shing Wan, Timothy Shin-Heng Mak, and Paul F. O’Reilly. 2020. “[PRS] Tutorial: A Guide to Performing Polygenic Risk Score Analyses.” Nature Protocols 15 (9): 2759–72. https://doi.org/10.1038/s41596-020-0353-1.
Chung, Wen-Hung, Shuen-Iu Hung, Hong-Shang Hong, Mo-Song Hsih, Li-Cheng Yang, Hsin-Chun Ho, Jer-Yuarn Wu, and Yuan-Tsong Chen. 2004. “A Marker for StevensJohnson Syndrome.” Nature 428 (6982): 486–86. https://doi.org/10.1038/428486a.
Clarke, Brian, Eva Holtkamp, Hakime Öztürk, Marcel Mück, Magnus Wahlberg, Kayla Meyer, Felix Munzlinger, et al. 2024. “[DeepRVAT] Integration of Variant Annotations Using Deep Set Networks Boosts Rare Variant Association Testing.” Nature Genetics 56 (10): 2271–80. https://doi.org/10.1038/s41588-024-01919-z.
Dabernig-Heinz, Johanna, Mara Lohde, Martin Hölzer, Adriana Cabal, Rick Conzemius, Christian Brandt, Matthias Kohl, et al. 2024. “A Multicenter Study on Accuracy and Reproducibility of Nanopore Sequencing-Based Genotyping of Bacterial Pathogens.” Journal of Clinical Microbiology 62 (9): e00628–24. https://doi.org/10.1128/jcm.00628-24.
Dalla-Torre, Hugo, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, et al. 2023. “Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics.” Nature Methods 22 (2): 287–97. https://doi.org/10.1038/s41592-024-02523-z.
Davydov, Eugene V., David L. Goode, Marina Sirota, Gregory M. Cooper, Arend Sidow, and Serafim Batzoglou. 2010. “Identifying a High Fraction of the Human Genome to Be Under Selective Constraint Using GERP++.” PLOS Computational Biology 6 (12): e1001025. https://doi.org/10.1371/journal.pcbi.1001025.
DePristo, Mark A., Eric Banks, Ryan Poplin, Kiran V. Garimella, Jared R. Maguire, Christopher Hartl, Anthony A. Philippakis, et al. 2011. “A Framework for Variation Discovery and Genotyping Using Next-Generation DNA Sequencing Data.” Nature Genetics 43 (5): 491–98. https://doi.org/10.1038/ng.806.
Fishman, Veniamin, Yuri Kuratov, Aleksei Shmelev, Maxim Petrov, Dmitry Penzar, Denis Shepelin, Nikolay Chekanov, Olga Kardymon, and Mikhail Burtsev. 2025. GENA-LM: A Family of Open-Source Foundational DNA Language Models for Long Sequences.” Nucleic Acids Research 53 (2): gkae1310. https://doi.org/10.1093/nar/gkae1310.
Frankish, Adam, Mark Diekhans, Anne-Maud Ferreira, Rory Johnson, Irwin Jungreis, Jane Loveland, Jonathan M Mudge, et al. 2019. GENCODE Reference Annotation for the Human and Mouse Genomes.” Nucleic Acids Research 47 (D1): D766–73. https://doi.org/10.1093/nar/gky955.
Gao, Ziqi, Chenran Jiang, Jiawen Zhang, Xiaosen Jiang, Lanqing Li, Peilin Zhao, Huanming Yang, Yong Huang, and Jia Li. 2023. “[HIGH-PPI] Hierarchical Graph Learning for Protein–Protein Interaction.” Nature Communications 14 (1): 1093. https://doi.org/10.1038/s41467-023-36736-1.
Garrison, Erik, Jouni Sirén, Adam M. Novak, Glenn Hickey, Jordan M. Eizenga, Eric T. Dawson, William Jones, et al. 2018. “Variation Graph Toolkit Improves Read Mapping by Representing Genetic Variation in the Reference.” Nature Biotechnology 36 (9): 875–79. https://doi.org/10.1038/nbt.4227.
Georgantas, Costa, Zoltán Kutalik, and Jonas Richiardi. 2024. “Delphi: A Deep-Learning Method for Polygenic Risk Prediction.” medRxiv. https://doi.org/10.1101/2024.04.19.24306079.
Goodwin, Sara, John D. McPherson, and W. Richard McCombie. 2016. “Coming of Age: Ten Years of Next-Generation Sequencing Technologies.” Nature Reviews Genetics 17 (6): 333–51. https://doi.org/10.1038/nrg.2016.49.
Guo, Fei, Renchu Guan, Yaohang Li, Qi Liu, Xiaowo Wang, Can Yang, and Jianxin Wang. 2025. “Foundation Models in Bioinformatics.” National Science Review 12 (4): nwaf028. https://doi.org/10.1093/nsr/nwaf028.
He, Shujun, Baizhen Gao, Rushant Sabnis, and Qing Sun. 2023. “Nucleic Transformer: Classifying DNA Sequences with Self-Attention and Convolutions.” ACS Synthetic Biology 12 (11): 3205–14. https://doi.org/10.1021/acssynbio.3c00154.
Hudaiberdiev, Sanjarbek, D. Leland Taylor, Wei Song, Narisu Narisu, Redwan M. Bhuiyan, Henry J. Taylor, Xuming Tang, et al. 2023. “[TREDNet] Modeling Islet Enhancers Using Deep Learning Identifies Candidate Causal Variants at Loci Associated with T2D and Glycemic Traits.” Proceedings of the National Academy of Sciences 120 (35): e2206612120. https://doi.org/10.1073/pnas.2206612120.
Jaganathan, Kishore, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F. McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A. Kosmicki, et al. 2019. “[SpliceAI] Predicting Splicing from Primary Sequence with Deep Learning.” Cell 176 (3): 535–548.e24. https://doi.org/10.1016/j.cell.2018.12.015.
Ji, Yanrong, Zhihan Zhou, Han Liu, and Ramana V Davuluri. 2021. DNABERT: Pre-Trained Bidirectional Encoder Representations from Transformers Model for DNA-Language in Genome.” Bioinformatics 37 (15): 2112–20. https://doi.org/10.1093/bioinformatics/btab083.
Jiang, Tao, Yongzhuang Liu, Yue Jiang, Junyi Li, Yan Gao, Zhe Cui, Yadong Liu, Bo Liu, and Yadong Wang. 2020. “Long-Read-Based Human Genomic Structural Variation Detection with cuteSV.” Genome Biology 21 (1): 189. https://doi.org/10.1186/s13059-020-02107-y.
Jurenaite, Neringa, Daniel León-Periñán, Veronika Donath, Sunna Torge, and René Jäkel. 2024. SetQuence & SetOmic: Deep Set Transformers for Whole Genome and Exome Tumour Analysis.” BioSystems 235 (January): 105095. https://doi.org/10.1016/j.biosystems.2023.105095.
Kagda, Meenakshi S., Bonita Lam, Casey Litton, Corinn Small, Cricket A. Sloan, Emma Spragins, Forrest Tanaka, et al. 2025. “Data Navigation on the ENCODE Portal.” Nature Communications 16 (1): 9592. https://doi.org/10.1038/s41467-025-64343-9.
Karczewski, Konrad J., Laurent C. Francioli, Grace Tiao, Beryl B. Cummings, Jessica Alföldi, Qingbo Wang, Ryan L. Collins, et al. 2020. “The Mutational Constraint Spectrum Quantified from Variation in 141,456 Humans.” Nature 581 (7809): 434–43. https://doi.org/10.1038/s41586-020-2308-7.
Krusche, Peter, Len Trigg, Paul C. Boutros, Christopher E. Mason, Francisco M. De La Vega, Benjamin L. Moore, Mar Gonzalez-Porta, et al. 2019. “Best Practices for Benchmarking Germline Small Variant Calls in Human Genomes.” Nature Biotechnology 37 (5): 555–60. https://doi.org/10.1038/s41587-019-0054-x.
Kundaje, Anshul, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, et al. 2015. “Integrative Analysis of 111 Reference Human Epigenomes.” Nature 518 (7539): 317–30. https://doi.org/10.1038/nature14248.
Kurki, Mitja I., Juha Karjalainen, Priit Palta, Timo P. Sipilä, Kati Kristiansson, Kati M. Donner, Mary P. Reeve, et al. 2023. FinnGen Provides Genetic Insights from a Well-Phenotyped Isolated Population.” Nature 613 (7944): 508–18. https://doi.org/10.1038/s41586-022-05473-8.
Lambert, Samuel A., Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, et al. 2021. “The Polygenic Score Catalog as an Open Database for Reproducibility and Systematic Evaluation.” Nature Genetics 53 (4): 420–25. https://doi.org/10.1038/s41588-021-00783-5.
Lee, Ingoo, Zachary S. Wallace, Yuqi Wang, Sungjoon Park, Hojung Nam, Amit R. Majithia, and Trey Ideker. 2025. “[G2PT] A Genotype-Phenotype Transformer to Assess and Explain Polygenic Risk.” bioRxiv. https://doi.org/10.1101/2024.10.23.619940.
Li, Hao, Zebei Han, Yu Sun, Fu Wang, Pengzhen Hu, Yuang Gao, Xuemei Bai, et al. 2024. CGMega: Explainable Graph Neural Network Framework with Attention Mechanisms for Cancer Gene Module Dissection.” Nature Communications 15 (1): 5997. https://doi.org/10.1038/s41467-024-50426-6.
Li, Heng. 2013. “Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM.” arXiv. https://doi.org/10.48550/arXiv.1303.3997.
———. 2014. “Towards Better Understanding of Artifacts in Variant Calling from High-Coverage Samples.” Bioinformatics 30 (20): 2843–51. https://doi.org/10.1093/bioinformatics/btu356.
———. 2018. “Minimap2: Pairwise Alignment for Nucleotide Sequences.” Bioinformatics 34 (18): 3094–3100. https://doi.org/10.1093/bioinformatics/bty191.
Li, Xiao, Jie Ma, Ling Leng, Mingfei Han, Mansheng Li, Fuchu He, and Yunping Zhu. 2022. MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis.” Frontiers in Genetics 13 (February). https://doi.org/10.3389/fgene.2022.806842.
Li, Zehui, Akashaditya Das, William A. V. Beardall, Yiren Zhao, and Guy-Bart Stan. 2023. “Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer.” arXiv. https://doi.org/10.48550/arXiv.2306.05143.
Liao, Wen-Wei, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, Glenn Hickey, Shuangjia Lu, et al. 2023. “A Draft Human Pangenome Reference.” Nature 617 (7960): 312–24. https://doi.org/10.1038/s41586-023-05896-x.
Lin, Zeming, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, et al. 2022. “[ESM-2] Language Models of Protein Sequences at the Scale of Evolution Enable Accurate Structure Prediction.” bioRxiv. https://doi.org/10.1101/2022.07.20.500902.
Linder, Johannes, Divyanshi Srivastava, Han Yuan, Vikram Agarwal, and David R. Kelley. 2025. “[Borzoi] Predicting RNA-Seq Coverage from DNA Sequence as a Unifying Model of Gene Regulation.” Nature Genetics 57 (4): 949–61. https://doi.org/10.1038/s41588-024-02053-6.
Liu, Zicheng, Siyuan Li, Zhiyuan Chen, Fang Wu, Chang Yu, Qirong Yang, Yucheng Guo, Yujie Yang, Xiaoming Zhang, and Stan Z. Li. 2025. “Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification.” arXiv. https://doi.org/10.48550/arXiv.2502.07299.
Loh, Po-Ru, Petr Danecek, Pier Francesco Palamara, Christian Fuchsberger, Yakir A Reshef, Hilary K Finucane, Sebastian Schoenherr, et al. 2016. “Reference-Based Phasing Using the Haplotype Reference Consortium Panel.” Nature Genetics 48 (11): 1443–48. https://doi.org/10.1038/ng.3679.
Ma, Jiani, Jiangning Song, Neil D. Young, Bill C. H. Chang, Pasi K. Korhonen, Tulio L. Campos, Hui Liu, and Robin B. Gasser. 2023. “’Bingo’-a Large Language Model- and Graph Neural Network-Based Workflow for the Prediction of Essential Genes from Protein Data.” Briefings in Bioinformatics 25 (1): bbad472. https://doi.org/10.1093/bib/bbad472.
Mallal, Simon, Elizabeth Phillips, Giampiero Carosi, Jean-Michel Molina, Cassy Workman, Janez Tomažič, Eva Jägel-Guedes, et al. 2008. HLA-B*5701 Screening for Hypersensitivity to Abacavir.” New England Journal of Medicine 358 (6): 568–79. https://doi.org/10.1056/NEJMoa0706135.
Manzo, Gaetano, Kathryn Borkowski, and Ivan Ovcharenko. 2025. “Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants.” bioRxiv: The Preprint Server for Biology, June, 2025.05.19.654920. https://doi.org/10.1101/2025.05.19.654920.
Marees, Andries T., Hilde de Kluiver, Sven Stringer, Florence Vorspan, Emmanuel Curis, Cynthia Marie-Claire, and Eske M. Derks. 2018. “[GWAS] A Tutorial on Conducting Genome-Wide Association Studies: Quality Control and Statistical Analysis.” International Journal of Methods in Psychiatric Research 27 (2): e1608. https://doi.org/10.1002/mpr.1608.
Marquet, Céline, Julius Schlensok, Marina Abakarova, Burkhard Rost, and Elodie Laine. 2024. “[VespaG] Expert-Guided Protein Language Models Enable Accurate and Blazingly Fast Fitness Prediction.” Bioinformatics 40 (11): btae621. https://doi.org/10.1093/bioinformatics/btae621.
McKenna, Aaron, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran Garimella, et al. 2010. “The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data.” Genome Research 20 (9): 1297–1303. https://doi.org/10.1101/gr.107524.110.
Medvedev, Aleksandr, Karthik Viswanathan, Praveenkumar Kanithi, Kirill Vishniakov, Prateek Munjal, Clément Christophe, Marco AF Pimentel, Ronnie Rajan, and Shadab Khan. 2025. BioToken and BioFMBiologically-Informed Tokenization Enables Accurate and Efficient Genomic Foundation Models.” bioRxiv. https://doi.org/10.1101/2025.03.27.645711.
Meier, Joshua, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alexander Rives. 2021. “[ESM-1v] Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function.” bioRxiv. https://doi.org/10.1101/2021.07.09.450648.
Morales, Joannella, Shashikant Pujar, Jane E. Loveland, Alex Astashyn, Ruth Bennett, Andrew Berry, Eric Cox, et al. 2022. “A Joint NCBI and EMBL-EBI Transcript Set for Clinical Genomics and Research.” Nature 604 (7905): 310–15. https://doi.org/10.1038/s41586-022-04558-8.
Mountjoy, Edward, Ellen M. Schmidt, Miguel Carmona, Jeremy Schwartzentruber, Gareth Peat, Alfredo Miranda, Luca Fumis, et al. 2021. “An Open Approach to Systematically Prioritize Causal Variants and Genes at All Published Human GWAS Trait-Associated Loci.” Nature Genetics 53 (11): 1527–33. https://doi.org/10.1038/s41588-021-00945-5.
Naghipourfar, Mohsen, Siyu Chen, Mathew K. Howard, Christian B. Macdonald, Ali Saberi, Timo Hagen, Mohammad R. K. Mofrad, Willow Coyote-Maestas, and Hani Goodarzi. 2024. “[cdsFM - EnCodon/DeCodon] A Suite of Foundation Models Captures the Contextual Interplay Between Codons.” bioRxiv. https://doi.org/10.1101/2024.10.10.617568.
Ng, Pauline C., and Steven Henikoff. 2003. SIFT: Predicting Amino Acid Changes That Affect Protein Function.” Nucleic Acids Research 31 (13): 3812–14. https://doi.org/10.1093/nar/gkg509.
Nguyen, Eric, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, et al. 2023. HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution.” arXiv. https://doi.org/10.48550/arXiv.2306.15794.
Nielsen, Rasmus, Joshua S. Paul, Anders Albrechtsen, and Yun S. Song. 2011. “Genotype and SNP Calling from Next-Generation Sequencing Data.” Nature Reviews. Genetics 12 (6): 443–51. https://doi.org/10.1038/nrg2986.
Notin, Pascal, Aaron Kollasch, Daniel Ritter, Lood van Niekerk, Steffanie Paul, Han Spinner, Nathan Rollins, et al. 2023. ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design.” Advances in Neural Information Processing Systems 36 (December): 64331–79. https://papers.nips.cc/paper_files/paper/2023/hash/cac723e5ff29f65e3fcbb0739ae91bee-Abstract-Datasets_and_Benchmarks.html.
Nurk, Sergey, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V. Bzikadze, Alla Mikheenko, Mitchell R. Vollger, et al. 2022. “The Complete Sequence of a Human Genome.” Science 376 (6588): 44–53. https://doi.org/10.1126/science.abj6987.
O’Connell, Jared, Deepti Gurdasani, Olivier Delaneau, Nicola Pirastu, Sheila Ulivi, Massimiliano Cocca, Michela Traglia, et al. 2014. “A General Approach for Haplotype Phasing Across the Full Spectrum of Relatedness.” PLOS Genetics 10 (4): e1004234. https://doi.org/10.1371/journal.pgen.1004234.
O’Leary, Nuala A., Mathew W. Wright, J. Rodney Brister, Stacy Ciufo, Diana Haddad, Rich McVeigh, Bhanu Rajput, et al. 2016. “Reference Sequence (RefSeq) Database at NCBI: Current Status, Taxonomic Expansion, and Functional Annotation.” Nucleic Acids Research 44 (D1): D733–45. https://doi.org/10.1093/nar/gkv1189.
PacificBiosciences/Pbsv.” 2025. PacBio. https://github.com/PacificBiosciences/pbsv.
Padyukov, Leonid. 2022. “Genetics of Rheumatoid Arthritis.” Seminars in Immunopathology 44 (1): 47–62. https://doi.org/10.1007/s00281-022-00912-0.
Pasaniuc, Bogdan, and Alkes L. Price. 2016. “Dissecting the Genetics of Complex Traits Using Summary Association Statistics.” Nature Reviews Genetics 18 (2): 117–27. https://doi.org/10.1038/nrg.2016.142.
Poplin, Ryan, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, et al. 2018. “[DeepVariant] A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks.” Nature Biotechnology 36 (10): 983–87. https://doi.org/10.1038/nbt.4235.
Rakowski, Alexander, and Christoph Lippert. 2025. “[MIFM] Multiple Instance Fine-Mapping: Predicting Causal Regulatory Variants with a Deep Sequence Model.” medRxiv. https://doi.org/10.1101/2025.06.13.25329551.
RealTimeGenomics/Rtg-Core.” 2025. Real Time Genomics. https://github.com/RealTimeGenomics/rtg-core.
Rentzsch, Philipp, Daniela Witten, Gregory M Cooper, Jay Shendure, and Martin Kircher. 2019. CADD: Predicting the Deleteriousness of Variants Throughout the Human Genome.” Nucleic Acids Research 47 (D1): D886–94. https://doi.org/10.1093/nar/gky1016.
Rives, Alexander, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, et al. 2021. “[ESM-1b] Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences.” Proceedings of the National Academy of Sciences of the United States of America 118 (15): e2016239118. https://doi.org/10.1073/pnas.2016239118.
Robinson, James, Dominic J Barker, Xenia Georgiou, Michael A Cooper, Paul Flicek, and Steven G E Marsh. 2020. IPD-IMGT/HLA Database.” Nucleic Acids Research 48 (D1): D948–55. https://doi.org/10.1093/nar/gkz950.
Sakaue, Saori, Saisriram Gurajala, Michelle Curtis, Yang Luo, Wanson Choi, Kazuyoshi Ishigaki, Joyce B. Kang, et al. 2023. “Tutorial: A Statistical Genetics Guide to Identifying HLA Alleles Driving Complex Disease.” Nature Protocols 18 (9): 2625–41. https://doi.org/10.1038/s41596-023-00853-4.
Sanabria, Melissa, Jonas Hirsch, Pierre M. Joubert, and Anna R. Poetsch. 2024. “[GROVER] DNA Language Model GROVER Learns Sequence Context in the Human Genome.” Nature Machine Intelligence 6 (8): 911–23. https://doi.org/10.1038/s42256-024-00872-0.
Schubach, Max, Thorben Maass, Lusiné Nazaretyan, Sebastian Röner, and Martin Kircher. 2024. CADD V1.7: Using Protein Language Models, Regulatory CNNs and Other Nucleotide-Level Scores to Improve Genome-Wide Variant Predictions.” Nucleic Acids Research 52 (D1): D1143–54. https://doi.org/10.1093/nar/gkad989.
Shafin, Kishwar, Trevor Pesout, Pi-Chuan Chang, Maria Nattestad, Alexey Kolesnikov, Sidharth Goel, Gunjan Baid, et al. 2021. “Haplotype-Aware Variant Calling with PEPPER-Margin-DeepVariant Enables High Accuracy in Nanopore Long-Reads.” Nature Methods 18 (11): 1322–32. https://doi.org/10.1038/s41592-021-01299-w.
Sherry, S. T., M.-H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. 2001. dbSNP: The NCBI Database of Genetic Variation.” Nucleic Acids Research 29 (1): 308–11. https://doi.org/10.1093/nar/29.1.308.
Siepel, Adam, Gill Bejerano, Jakob S. Pedersen, Angie S. Hinrichs, Minmei Hou, Kate Rosenbloom, Hiram Clawson, et al. 2005. “[PhastCons] Evolutionarily Conserved Elements in Vertebrate, Insect, Worm, and Yeast Genomes.” Genome Research 15 (8): 1034–50. https://doi.org/10.1101/gr.3715005.
Sirugo, Giorgio, Scott M. Williams, and Sarah A. Tishkoff. 2019. “The Missing Diversity in Human Genetic Studies.” Cell 177 (1): 26–31. https://doi.org/10.1016/j.cell.2019.02.048.
Smolka, Moritz, Luis F. Paulin, Christopher M. Grochowski, Dominic W. Horner, Medhat Mahmoud, Sairam Behera, Ester Kalef-Ezra, et al. 2024. “Detection of Mosaic and Population-Level Structural Variants with Sniffles2.” Nature Biotechnology 42 (10): 1571–80. https://doi.org/10.1038/s41587-023-02024-y.
Sollis, Elliot, Abayomi Mosaku, Ala Abid, Annalisa Buniello, Maria Cerezo, Laurent Gil, Tudor Groza, et al. 2023. “The NHGRI-EBI GWAS Catalog: Knowledgebase and Deposition Resource.” Nucleic Acids Research 51 (D1): D977–85. https://doi.org/10.1093/nar/gkac1010.
Song, Li, Gali Bai, X. Shirley Liu, Bo Li, and Heng Li. 2022. T1K: Efficient and Accurate KIR and HLA Genotyping with Next-Generation Sequencing Data.” bioRxiv. https://doi.org/10.1101/2022.10.26.513955.
“The Genome Aggregation Database (gnomAD).” n.d. Accessed July 3, 2025. https://www.nature.com/immersive/d42859-020-00002-x/index.html.
Trop, Evan, Yair Schiff, Edgar Mariano Marroquin, Chia Hsiang Kao, Aaron Gokaslan, McKinley Polen, Mingyi Shao, et al. 2024. “The Genomics Long-Range Benchmark: Advancing DNA Language Models,” October. https://openreview.net/forum?id=8O9HLDrmtq.
Van der Auwera, Geraldine A., Mauricio O. Carneiro, Christopher Hartl, Ryan Poplin, Guillermo del Angel, Ami Levy-Moonshine, Tadeusz Jordan, et al. 2018. “From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline.” Current Protocols in Bioinformatics 43 (1): 11.10.1–33. https://doi.org/10.1002/0471250953.bi1110s43.
Verma, Anurag, Jennifer E. Huffman, Alex Rodriguez, Mitchell Conery, Molei Liu, Yuk-Lam Ho, Youngdae Kim, et al. 2024. “Diversity and Scale: Genetic Architecture of 2068 Traits in the VA Million Veteran Program.” Science 385 (6706): eadj1182. https://doi.org/10.1126/science.adj1182.
Vilhjálmsson, Bjarni J., Jian Yang, Hilary K. Finucane, Alexander Gusev, Sara Lindström, Stephan Ripke, Giulio Genovese, et al. 2015. “Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.” American Journal of Human Genetics 97 (4): 576–92. https://doi.org/10.1016/j.ajhg.2015.09.001.
Wenger, Aaron M., Paul Peluso, William J. Rowell, Pi-Chuan Chang, Richard J. Hall, Gregory T. Concepcion, Jana Ebler, et al. 2019. “Accurate Circular Consensus Long-Read Sequencing Improves Variant Detection and Assembly of a Human Genome.” Nature Biotechnology 37 (10): 1155–62. https://doi.org/10.1038/s41587-019-0217-9.
Wu, Yang, Zhili Zheng, Loic Thibaut2, Michael E. Goddard, Naomi R. Wray, Peter M. Visscher, and Jian Zeng. 2024. “Genome-Wide Fine-Mapping Improves Identification of Causal Variants.” Research Square, August, rs.3.rs–4759390. https://doi.org/10.21203/rs.3.rs-4759390/v1.
Yan, Binghao, Yunbi Nam, Lingyao Li, Rebecca A. Deek, Hongzhe Li, and Siyuan Ma. 2025. “Recent Advances in Deep Learning and Language Models for Studying the Microbiome.” Frontiers in Genetics 15 (January). https://doi.org/10.3389/fgene.2024.1494474.
Yuan, Qiuyue, and Zhana Duren. 2025. “[LINGER] Inferring Gene Regulatory Networks from Single-Cell Multiome Data Using Atlas-Scale External Data.” Nature Biotechnology 43 (2): 247–57. https://doi.org/10.1038/s41587-024-02182-7.
Yun, Taedong, Helen Li, Pi-Chuan Chang, Michael F Lin, Andrew Carroll, and Cory Y McLean. 2021. “Accurate, Scalable Cohort Variant Calls Using DeepVariant and GLnexus.” Bioinformatics 36 (24): 5582–89. https://doi.org/10.1093/bioinformatics/btaa1081.
Zheng, Rongbin, Changxin Wan, Shenglin Mei, Qian Qin, Qiu Wu, Hanfei Sun, Chen-Hao Chen, et al. 2019. “Cistrome Data Browser: Expanded Datasets and New Tools for Gene Regulatory Analysis.” Nucleic Acids Research 47 (D1): D729–35. https://doi.org/10.1093/nar/gky1094.
Zheng, Zhenxian, Shumin Li, Junhao Su, Amy Wing-Sze Leung, Tak-Wah Lam, and Ruibang Luo. 2022. “Symphonizing Pileup and Full-Alignment for Deep Learning-Based Long-Read Variant Calling.” Nature Computational Science 2 (12): 797–803. https://doi.org/10.1038/s43588-022-00387-x.
Zhou, Jian, Chandra L. Theesfeld, Kevin Yao, Kathleen M. Chen, Aaron K. Wong, and Olga G. Troyanskaya. 2018. “[Expecto] Deep Learning Sequence-Based Ab Initio Prediction of Variant Effects on Expression and Disease Risk.” Nature Genetics 50 (8): 1171–79. https://doi.org/10.1038/s41588-018-0160-6.
Zhou, Jian, and Olga G. Troyanskaya. 2015. “[DeepSEA] Predicting Effects of Noncoding Variants with Deep Learning–Based Sequence Model.” Nature Methods 12 (10): 931–34. https://doi.org/10.1038/nmeth.3547.
Zhou, Zhihan, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and Han Liu. 2024. DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome.” arXiv. https://doi.org/10.48550/arXiv.2306.15006.
Zook, Justin M., Jennifer McDaniel, Nathan D. Olson, Justin Wagner, Hemang Parikh, Haynes Heaton, Sean A. Irvine, et al. 2019. “An Open Resource for Accurately Benchmarking Small Variant and Reference Calls.” Nature Biotechnology 37 (5): 561–66. https://doi.org/10.1038/s41587-019-0074-6.