References
Abràmoff, Michael D., Philip T. Lavin, Michele Birch, Nilay Shah, and
James C. Folk. 2018. “Pivotal Trial of an Autonomous
AI-Based Diagnostic System for Detection of Diabetic
Retinopathy in Primary Care Offices.” Npj Digital
Medicine 1 (1): 39. https://doi.org/10.1038/s41746-018-0040-6.
Abramson, Josh, Jonas Adler, Jack Dunger, Richard Evans, Tim Green,
Alexander Pritzel, Olaf Ronneberger, et al. 2024.
“[AlphaFold3] Accurate Structure
Prediction of Biomolecular Interactions with AlphaFold
3.” Nature 630 (8016): 493–500. https://doi.org/10.1038/s41586-024-07487-w.
Adamson, Britt, Thomas M. Norman, Marco Jost, Min Y. Cho, James K.
Nuñez, Yuwen Chen, Jacqueline E. Villalta, et al. 2016. “A
Multiplexed Single-Cell
CRISPR Screening Platform
Enables Systematic Dissection of
the Unfolded Protein
Response.” Cell 167 (7): 1867–1882.e21. https://doi.org/10.1016/j.cell.2016.11.048.
Adzhubei, Ivan A., Steffen Schmidt, Leonid Peshkin, Vasily E. Ramensky,
Anna Gerasimova, Peer Bork, Alexey S. Kondrashov, and Shamil R. Sunyaev.
2010. “A Method and Server for Predicting Damaging Missense
Mutations.” Nature Methods 7 (4): 248–49. https://doi.org/10.1038/nmeth0410-248.
Agarwal, Vikram, and Jay Shendure. 2020. “Predicting mRNA Abundance Directly
from Genomic Sequence Using
Deep Convolutional Neural
Networks.” Cell Reports 31 (7): 107663. https://doi.org/10.1016/j.celrep.2020.107663.
Ahdritz, Gustaf, Nazim Bouatta, Christina Floristean, Sachin Kadyan,
Qinghui Xia, William Gerecke, Timothy J. O’Donnell, et al. 2024.
“OpenFold: Retraining AlphaFold2 Yields
New Insights into Its Learning Mechanisms and Capacity for
Generalization.” Nature Methods 21 (8): 1514–24. https://doi.org/10.1038/s41592-024-02272-z.
Ahlqvist, Emma, Petter Storm, Annemari Käräjämäki, Mats Martinell,
Mozhgan Dorkhan, Annelie Carlsson, Petter Vikman, et al. 2018.
“Novel Subgroups of Adult-Onset Diabetes and Their Association
with Outcomes: A Data-Driven Cluster Analysis of Six Variables.”
The Lancet Diabetes & Endocrinology 6 (5): 361–69. https://doi.org/10.1016/S2213-8587(18)30051-2.
Aibar, Sara, Carmen Bravo González-Blas, Thomas Moerman, Vân Anh
Huynh-Thu, Hana Imrichova, Gert Hulselmans, Florian Rambow, et al. 2017.
“SCENIC: Single-Cell Regulatory Network Inference and
Clustering.” Nature Methods 14 (11): 1083–86. https://doi.org/10.1038/nmeth.4463.
All of Us Research Program Investigators, The. 2019. “The
‘All of Us’ Research
Program.” New England Journal of Medicine
381 (7): 668–76. https://doi.org/10.1056/NEJMsr1809937.
Amariuta, Tiffany, Kazuyoshi Ishigaki, Hiroki Sugishita, Tazro Ohta,
Masaru Koido, Kushal K. Dey, Koichi Matsuda, et al. 2020.
“Improving the Trans-Ancestry Portability of Polygenic Risk Scores
by Prioritizing Variants in Predicted Cell-Type-Specific Regulatory
Elements.” Nature Genetics 52 (12): 1346–54. https://doi.org/10.1038/s41588-020-00740-8.
Amberger, Joanna S., Carol A. Bocchini, François Schiettecatte, Alan F.
Scott, and Ada Hamosh. 2015. “OMIM.org:
Online Mendelian Inheritance in
Man (OMIM®), an Online Catalog of Human Genes
and Genetic Disorders.” Nucleic Acids Research 43 (D1):
D789–98. https://doi.org/10.1093/nar/gku1205.
André, Fabrice, Eva Ciruelos, Gabor Rubovszky, Mario Campone, Sibylle
Loibl, Hope S. Rugo, Hiroji Iwata, et al. 2019. “Alpelisib for
PIK3CA-Mutated, Hormone
Receptor–Positive Advanced
Breast Cancer.” New England Journal
of Medicine 380 (20): 1929–40. https://doi.org/10.1056/NEJMoa1813904.
Angelopoulos, Anastasios N., and Stephen Bates. 2023. “Conformal
Prediction: A Gentle
Introduction.” Foundations and Trends® in
Machine Learning 16 (4): 494–591. https://doi.org/10.1561/2200000101.
Argelaguet, Ricard, Britta Velten, Damien Arnol, Sascha Dietrich,
Thorsten Zenz, John C. Marioni, Florian Buettner, Wolfgang Huber, and
Oliver Stegle. 2018. “Multi‐Omics Factor
Analysis—a Framework for Unsupervised Integration of
Multi‐omics Data Sets.” Molecular Systems Biology 14
(6): MSB178124. https://doi.org/10.15252/msb.20178124.
Arnold, Lord Justice, Lady Justice Laing, and Lord Justice Birss. 2021.
“Thaler v Comptroller General of
Patents Trade Marks
And Designs [2021] EWCA
Civ 1374.”
Ashuach, Tal, Mariano I. Gabitto, Michael I. Koodber, Valentine
Svensson, Michael I. Jordan, and Nir Yosef. 2023.
“MultiVI: Deep Generative Model for the Integration
of Multimodal Data.” Nature Methods 20 (8): 1232–40. https://doi.org/10.1038/s41592-023-01909-9.
Auton, Adam, Gonçalo R. Abecasis, David M. Altshuler, Richard M. Durbin,
Gonçalo R. Abecasis, David R. Bentley, Aravinda Chakravarti, et al.
2015. “A Global Reference for Human Genetic Variation.”
Nature 526 (7571): 68–74. https://doi.org/10.1038/nature15393.
Avsec, Žiga, Vikram Agarwal, D. Visentin, J. Ledsam, A.
Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, J. Jumper, Pushmeet
Kohli, and David R. Kelley. 2021. “[Enformer]
Effective Gene Expression Prediction from Sequence by
Integrating Long-Range Interactions.” Nature Methods 18
(October): 1196–1203. https://doi.org/10.1038/s41592-021-01252-x.
Avsec, Ziga, Natasha Latysheva, and Jun Cheng. 2025.
“AlphaGenome: AI for Better
Understanding the Genome.”
Bach, Sebastian, Alexander Binder, Grégoire Montavon, Frederick
Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. “On
Pixel-Wise Explanations for Non-Linear Classifier Decisions by
Layer-Wise Relevance Propagation.” PLoS ONE 10 (7):
e0130140. https://doi.org/10.1371/journal.pone.0130140.
Baek, Minkyung, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey
Ovchinnikov, Gyu Rie Lee, Jue Wang, et al. 2021. “Accurate
Prediction of Protein Structures and Interactions Using a Three-Track
Neural Network.” Science 373 (6557): 871–76. https://doi.org/10.1126/science.abj8754.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019.
“Reconciling Modern Machine-Learning Practice and the Classical
Bias–Variance Trade-Off.” Proceedings of the National Academy
of Sciences 116 (32): 15849–54. https://doi.org/10.1073/pnas.1903070116.
Ben-David, Shai, John Blitzer, Koby Crammer, Alex Kulesza, Fernando
Pereira, and Jennifer Wortman Vaughan. 2010. “A Theory of Learning
from Different Domains.” Machine Learning 79 (1):
151–75. https://doi.org/10.1007/s10994-009-5152-4.
Benegas, Gonzalo, Carlos Albors, Alan J. Aw, Chengzhong Ye, and Yun S.
Song. 2024. “GPN-MSA: An Alignment-Based
DNA Language Model for Genome-Wide Variant Effect
Prediction.” bioRxiv, April, 2023.10.10.561776. https://doi.org/10.1101/2023.10.10.561776.
Benegas, Gonzalo, Sanjit Singh Batra, and Yun S. Song. 2023.
“[GPN] DNA Language Models Are Powerful
Predictors of Genome-Wide Variant Effects.” Proceedings of
the National Academy of Sciences 120 (44): e2311219120. https://doi.org/10.1073/pnas.2311219120.
Benegas, Gonzalo, Gökcen Eraslan, and Yun S. Song. 2025.
“[TraitGym] Benchmarking
DNA Sequence Models for
Causal Regulatory Variant
Prediction in Human
Genetics.” bioRxiv. https://doi.org/10.1101/2025.02.11.637758.
Bengs, Viktor, Eyke Hüllermeier, and Willem Waegeman. 2022.
“Pitfalls of Epistemic Uncertainty
Quantification Through Loss
Minimisation.” In Advances in
Neural Information Processing
Systems, 35:29205–16.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the
False Discovery Rate:
A Practical and Powerful
Approach to Multiple
Testing.” Journal of the Royal Statistical
Society: Series B (Methodological) 57 (1): 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Benner, Christian, Chris C. A. Spencer, Aki S. Havulinna, Veikko
Salomaa, Samuli Ripatti, and Matti Pirinen. 2016.
“FINEMAP: Efficient Variable Selection Using Summary
Data from Genome-Wide Association Studies.”
Bioinformatics 32 (10): 1493–1501. https://doi.org/10.1093/bioinformatics/btw018.
Bergquist, Timothy, Sarah L. Stenton, Emily A. W. Nadeau, Alicia B.
Byrne, Marc S. Greenblatt, Steven M. Harrison, Sean V. Tavtigian, et al.
2025. “Calibration of Additional Computational Tools Expands
ClinGen Recommendation Options for Variant Classification
with PP3/BP4 Criteria.” Genetics in
Medicine 27 (6): 101402. https://doi.org/10.1016/j.gim.2025.101402.
Berman, Helen M., John Westbrook, Zukang Feng, Gary Gilliland, T. N.
Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne. 2000.
“The Protein Data
Bank.” Nucleic Acids Research 28 (1):
235–42. https://doi.org/10.1093/nar/28.1.235.
Birman-Deych, Elena, Amy D. Waterman, Yan Yan, David S. Nilasena, Martha
J. Radford, and Brian F. Gage. 2005. “Accuracy of
ICD-9-CM Codes for
Identifying Cardiovascular and
Stroke Risk Factors.”
Medical Care 43 (5): 480. https://doi.org/10.1097/01.mlr.0000160417.39497.a9.
Boer, Carl G. de, Eeshit Dhaval Vaishnav, Ronen Sadeh, Esteban Luis
Abeyta, Nir Friedman, and Aviv Regev. 2019. “Deciphering
Eukaryotic Gene-Regulatory Logic with 100 Million Random
Promoters.” Nature Biotechnology 38 (1): 56–65. https://doi.org/10.1038/s41587-019-0315-8.
Bommasani, Rishi, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran
Arora, Sydney von Arx, Michael S. Bernstein, et al. 2022. “On the
Opportunities and Risks of
Foundation Models.” arXiv. https://doi.org/10.48550/arXiv.2108.07258.
Boshar, Sam, Benjamin Evans, Ziqi Tang, Armand Picard, Yanis Adel,
Franziska K Lorbeer, Chandana Rajesh, et al. n.d. “A Foundational
Model for Joint Sequence-Function Multi-Species Modeling at Scale for
Long-Range Genomic Prediction.”
Bowden, Jack, George Davey Smith, and Stephen Burgess. 2015.
“Mendelian Randomization with Invalid Instruments: Effect
Estimation and Bias Detection Through Egger
Regression.” International Journal of Epidemiology 44
(2): 512–25. https://doi.org/10.1093/ije/dyv080.
Brandes, Nadav, Grant Goldman, Charlotte H. Wang, Chun Jimmie Ye, and
Vasilis Ntranos. 2023. “Genome-Wide Prediction of Disease Variant
Effects with a Deep Protein Language Model.” Nature
Genetics 55 (9): 1512–22. https://doi.org/10.1038/s41588-023-01465-0.
Breiman, Leo. 2001. “Statistical Modeling:
The Two Cultures.”
Statistical Science, August.
Brixi, Garyk, Matthew G. Durrant, Jerome Ku, Michael Poli, Greg
Brockman, Daniel Chang, Gabriel A. Gonzalez, et al. 2025.
“[Evo 2] Genome Modeling and Design
Across All Domains of Life with Evo 2.” bioRxiv. https://doi.org/10.1101/2025.02.18.638918.
Brnich, Sarah E., Ahmad N. Abou Tayoun, Fergus J. Couch, Garry R.
Cutting, Marc S. Greenblatt, Christopher D. Heinen, Dona M. Kanavy, et
al. 2019. “Recommendations for Application of the Functional
Evidence PS3/BS3 Criterion Using the
ACMG/AMP Sequence Variant Interpretation
Framework.” Genome Medicine 12 (1): 3. https://doi.org/10.1186/s13073-019-0690-2.
Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan,
Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language
Models Are Few-Shot
Learners.” Advances in Neural Information
Processing Systems 33 (December): 1877–1901.
Browning, Brian L., Xiaowen Tian, Ying Zhou, and Sharon R. Browning.
2021. “Fast Two-Stage Phasing of Large-Scale Sequence
Data.” American Journal of Human Genetics 108 (10):
1880–90. https://doi.org/10.1016/j.ajhg.2021.08.005.
Brunak, Soren, Hannah Carter, and John Moult. 2023.
“CAGI 6: Critical
Assessment of Genome
Interpretation, Sixth
Edition.” Human Genetics, October.
Buniello, Annalisa, Daniel Suveges, Carlos Cruz-Castillo, Manuel Bernal
Llinares, Helena Cornu, Irene Lopez, Kirill Tsukanov, et al. 2025.
“Open Targets Platform: Facilitating
Therapeutic Hypotheses Building in Drug Discovery.” Nucleic
Acids Research 53 (D1): D1467–75. https://doi.org/10.1093/nar/gkae1128.
Bycroft, Clare, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T.
Elliott, Kevin Sharp, Allan Motyer, et al. 2018. “The
UK Biobank Resource with Deep Phenotyping and
Genomic Data.” Nature 562 (7726): 203–9. https://doi.org/10.1038/s41586-018-0579-z.
Camillo, Lucas Paulo de Lima, Raghav Sehgal, Jenel Armstrong, Albert T.
Higgins-Chen, Steve Horvath, and Bo Wang. 2024.
“CpGPT: A Foundation Model
for DNA Methylation.” bioRxiv. https://doi.org/10.1101/2024.10.24.619766.
Candès, Emmanuel, Yingying Fan, Lucas Janson, and Jinchi Lv. 2018.
“Panning for Gold:
‘Model-X’ Knockoffs
for High Dimensional Controlled
Variable Selection.” Journal of the
Royal Statistical Society Series B: Statistical Methodology 80 (3):
551–77. https://doi.org/10.1111/rssb.12265.
Cao, Zhi-Jie, and Ge Gao. 2022. “[GLUE]
Multi-Omics Single-Cell Data Integration and Regulatory
Inference with Graph-Linked Embedding.” Nature
Biotechnology 40 (10): 1458–66. https://doi.org/10.1038/s41587-022-01284-4.
Castro-Mondragon, Jaime A., Rafael Riudavets-Puig, Ieva Rauluseviciute,
Roza Berhanu Lemma, Laura Turchi, Romain Blanc-Mathieu, Jeremy Lucas, et
al. 2022. “JASPAR 2022: The 9th Release of the
Open-Access Database of Transcription Factor Binding Profiles.”
Nucleic Acids Research 50 (D1): D198–207. https://doi.org/10.1093/nar/gkab1113.
Center for Disease Control. 2022. “ACCE
Model Process for Evaluating
Genetic Tests.”
Chandak, Payal, Kexin Huang, and Marinka Zitnik. 2023.
“[PrimeKG] Building a Knowledge Graph to
Enable Precision Medicine.” Scientific Data 10 (1): 67.
https://doi.org/10.1038/s41597-023-01960-3.
Chapman, Paul B., Axel Hauschild, Caroline Robert, John B. Haanen, Paolo
Ascierto, James Larkin, Reinhard Dummer, et al. 2011. “Improved
Survival with Vemurafenib in
Melanoma with BRAF V600E
Mutation.” New England Journal of Medicine
364 (26): 2507–16. https://doi.org/10.1056/NEJMoa1103782.
Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip
Kegelmeyer. 2002. “SMOTE: Synthetic Minority
over-Sampling Technique.” J. Artif. Int. Res. 16 (1):
321–57.
Chen, Elaine, Flavia M. Facio, Kerry W. Aradhya, Susan Rojahn, Kathryn
E. Hatchell, Sienna Aguilar, Karen Ouyang, et al. 2023. “Rates and
Classification of Variants of
Uncertain Significance in
Hereditary Disease Genetic
Testing.” JAMA Network Open 6 (10):
e2339571. https://doi.org/10.1001/jamanetworkopen.2023.39571.
Chen, Jiayang, Zhihang Hu, Siqi Sun, Qingxiong Tan, Yixuan Wang, Qinze
Yu, Licheng Zong, et al. 2022. “[RNA-FM]
Interpretable RNA Foundation
Model from Unannotated Data for
Highly Accurate RNA
Structure and Function
Predictions.” arXiv. https://doi.org/10.48550/arXiv.2204.00300.
Chen, Kathleen M., Aaron K. Wong, Olga G. Troyanskaya, and Jian Zhou.
2022. “[DeepSEA Sei] A
Sequence-Based Global Map of Regulatory Activity for Deciphering Human
Genetics.” Nature Genetics 54 (7): 940–49. https://doi.org/10.1038/s41588-022-01102-2.
Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton.
2020. “A Simple Framework for
Contrastive Learning of Visual
Representations.” In Proceedings of the 37th
International Conference on
Machine Learning, 1597–607. PMLR.
Cheng, Jun, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė,
Taylor Applebaum, Alexander Pritzel, et al. 2023.
“[AlphaMissense] Accurate Proteome-Wide
Missense Variant Effect Prediction with
AlphaMissense.” Science 381 (6664):
eadg7492. https://doi.org/10.1126/science.adg7492.
Cheng, Wenduo, Zhenqiao Song, Yang Zhang, Shike Wang, Danqing Wang, Muyu
Yang, Lei Li, and Jian Ma. 2024. “DNALONGBENCH:
A Benchmark Suite
For Long-Range DNA
Prediction Tasks,” October.
Cho, Kyunghyun, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua
Bengio. 2014. “On the Properties of
Neural Machine Translation:
Encoder-Decoder
Approaches.” arXiv. https://doi.org/10.48550/arXiv.1409.1259.
Choi, Shing Wan, Timothy Shin-Heng Mak, and Paul F. O’Reilly. 2020.
“[PRS] Tutorial: A Guide to Performing
Polygenic Risk Score Analyses.” Nature Protocols 15 (9):
2759–72. https://doi.org/10.1038/s41596-020-0353-1.
Choromanski, Krzysztof, Valerii Likhosherstov, David Dohan, Xingyou
Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, et al. 2022.
“Rethinking Attention with
Performers.” arXiv. https://doi.org/10.48550/arXiv.2009.14794.
Chowdhery, Aakanksha, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav
Mishra, Adam Roberts, Paul Barham, et al. 2022.
“PaLM: Scaling Language
Modeling with Pathways.” arXiv. https://doi.org/10.48550/arXiv.2204.02311.
Chung, Wen-Hung, Shuen-Iu Hung, Hong-Shang Hong, Mo-Song Hsih, Li-Cheng
Yang, Hsin-Chun Ho, Jer-Yuarn Wu, and Yuan-Tsong Chen. 2004. “A
Marker for Stevens–Johnson Syndrome.”
Nature 428 (6982): 486–86. https://doi.org/10.1038/428486a.
Cirulli, Elizabeth T., Simon White, Robert W. Read, Gai Elhanan, William
J. Metcalf, Francisco Tanudjaja, Donna M. Fath, et al. 2020.
“Genome-Wide Rare Variant Analysis for Thousands of Phenotypes in
over 70,000 Exomes from Two Cohorts.” Nature
Communications 11 (1): 542. https://doi.org/10.1038/s41467-020-14288-y.
Clarke, Brian, Eva Holtkamp, Hakime Öztürk, Marcel Mück, Magnus
Wahlberg, Kayla Meyer, Felix Munzlinger, et al. 2024.
“[DeepRVAT] Integration of Variant
Annotations Using Deep Set Networks Boosts Rare Variant Association
Testing.” Nature Genetics 56 (10): 2271–80. https://doi.org/10.1038/s41588-024-01919-z.
Collins, Gary S., Johannes B. Reitsma, Douglas G. Altman, and Karel G.
M. Moons. 2015. “Transparent Reporting of a
Multivariable Prediction Model for Individual
Prognosis or Diagnosis (TRIPOD):
The TRIPOD Statement.” BMJ 350: g7594. https://doi.org/10.1136/bmj.g7594.
Consens, Micaela E., Cameron Dufault, Michael Wainberg, Duncan Forster,
Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, and Bo
Wang. 2025. “Transformers and Genome Language Models.”
Nature Machine Intelligence 7 (3): 346–62. https://doi.org/10.1038/s42256-025-01007-9.
Cornman, Andre, Jacob West-Roberts, Antonio Pedro Camargo, Simon Roux,
Martin Beracochea, Milot Mirdita, Sergey Ovchinnikov, and Yunha Hwang.
2024. “The OMG Dataset: An
Open MetaGenomic Corpus for Mixed-Modality
Genomic Language Modeling.” bioRxiv. https://doi.org/10.1101/2024.08.14.607850.
Corso, Gabriele, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi
Jaakkola. 2022. “DiffDock: Diffusion
Steps, Twists, and Turns for
Molecular Docking.” arXiv.org.
Cui, Haotian, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan
Duan, and Bo Wang. 2024. “scGPT:
Toward Building a Foundation Model for Single-Cell Multi-Omics Using
Generative AI.” Nature Methods 21 (8):
1470–80. https://doi.org/10.1038/s41592-024-02201-0.
Cui, Yin, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie.
2019. “Class-Balanced Loss
Based on Effective Number of
Samples.” In 2019
IEEE/CVF Conference on
Computer Vision and Pattern
Recognition (CVPR), 9260–69. https://doi.org/10.1109/CVPR.2019.00949.
Dabernig-Heinz, Johanna, Mara Lohde, Martin Hölzer, Adriana Cabal, Rick
Conzemius, Christian Brandt, Matthias Kohl, et al. 2024. “A
Multicenter Study on Accuracy and Reproducibility of Nanopore
Sequencing-Based Genotyping of Bacterial Pathogens.” Journal
of Clinical Microbiology 62 (9): e00628–24. https://doi.org/10.1128/jcm.00628-24.
Dallago, Christian, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann,
Nicholas Bhattacharya, Samuel Goldman, Ali Madani, and Kevin K. Yang.
2022. “FLIP: Benchmark Tasks in Fitness
Landscape Inference for Proteins.” bioRxiv. https://doi.org/10.1101/2021.11.09.467890.
Dalla-Torre, Hugo, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez
Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago,
et al. 2023. “Nucleotide Transformer: Building and
Evaluating Robust Foundation Models for Human Genomics.”
Nature Methods 22 (2): 287–97. https://doi.org/10.1038/s41592-024-02523-z.
Dang, Tien, Viet Thanh Duy Nguyen, Minh Tuan Le, and Truong-Son Hy.
2025. “BioMedKG: Multimodal Contrastive
Representation Learning in Augmented BioMedical Knowledge
Graphs.” Frontiers in Systems Biology 5 (December). https://doi.org/10.3389/fsysb.2025.1651930.
Dao, Tri, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. 2022.
“FlashAttention: Fast and
Memory-Efficient Exact
Attention with
IO-Awareness.” Advances in Neural
Information Processing Systems 35 (December): 16344–59.
Dauparas, J., I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F.
Milles, B. I. M. Wicky, et al. 2022. “Robust Deep Learning–Based
Protein Sequence Design Using ProteinMPNN.”
Science 378 (6615): 49–56. https://doi.org/10.1126/science.add2187.
Davey Smith, George, and Shah Ebrahim. 2003.
“‘Mendelian Randomization’: Can Genetic
Epidemiology Contribute to Understanding Environmental Determinants of
Disease?*.” International Journal of Epidemiology 32
(1): 1–22. https://doi.org/10.1093/ije/dyg070.
Davydov, Eugene V., David L. Goode, Marina Sirota, Gregory M. Cooper,
Arend Sidow, and Serafim Batzoglou. 2010. “Identifying a
High Fraction of the Human
Genome to Be Under Selective
Constraint Using GERP++.”
PLOS Computational Biology 6 (12): e1001025. https://doi.org/10.1371/journal.pcbi.1001025.
DeLong, Elizabeth R., David M. DeLong, and Daniel L. Clarke-Pearson.
1988. “Comparing the Areas Under Two or
More Correlated Receiver
Operating Characteristic Curves:
A Nonparametric Approach.”
Biometrics 44 (3): 837–45. https://doi.org/10.2307/2531595.
Denny, Joshua C., Marylyn D. Ritchie, Melissa A. Basford, Jill M.
Pulley, Lisa Bastarache, Kristin Brown-Gentry, Deede Wang, Dan R. Masys,
Dan M. Roden, and Dana C. Crawford. 2010. “PheWAS:
Demonstrating the Feasibility of a Phenome-Wide Scan to Discover
Gene–Disease Associations.” Bioinformatics 26 (9):
1205–10. https://doi.org/10.1093/bioinformatics/btq126.
DePristo, Mark A., Eric Banks, Ryan Poplin, Kiran V. Garimella, Jared R.
Maguire, Christopher Hartl, Anthony A. Philippakis, et al. 2011.
“A Framework for Variation Discovery and Genotyping Using
Next-Generation DNA Sequencing Data.” Nature
Genetics 43 (5): 491–98. https://doi.org/10.1038/ng.806.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.
“BERT: Pre-Training of Deep
Bidirectional Transformers for
Language Understanding.” arXiv. https://doi.org/10.48550/arXiv.1810.04805.
Dey, Kushal K., Bryce van de Geijn, Samuel Sungil Kim, Farhad
Hormozdiari, David R. Kelley, and Alkes L. Price. 2020.
“Evaluating the Informativeness of Deep Learning Annotations for
Human Complex Diseases.” Nature Communications 11 (1):
4703. https://doi.org/10.1038/s41467-020-18515-4.
Dibaeinia, Payam, Chris German, Suyash Shringarpure, Adam Auton, and Aly
A. Khan. 2025. “PRSformer: Disease
Prediction from Million-Scale
Individual Genotypes.” bioRxiv. https://doi.org/10.1101/2025.10.26.684578.
Dixit, Atray, Oren Parnas, Biyu Li, Jenny Chen, Charles P. Fulco, Livnat
Jerby-Arnon, Nemanja D. Marjanovic, et al. 2016.
“Perturb-Seq: Dissecting
Molecular Circuits with Scalable
Single-Cell RNA
Profiling of Pooled Genetic
Screens.” Cell 167 (7): 1853–1866.e17. https://doi.org/10.1016/j.cell.2016.11.038.
Dixon, Jesse R., Siddarth Selvaraj, Feng Yue, Audrey Kim, Yan Li, Yin
Shen, Ming Hu, Jun S. Liu, and Bing Ren. 2012. “Topological
Domains in Mammalian Genomes Identified by Analysis of Chromatin
Interactions.” Nature 485 (7398): 376–80. https://doi.org/10.1038/nature11082.
Dockès, Jérôme, Gaël Varoquaux, and Jean-Baptiste Poline. 2021.
“Preventing Dataset Shift from Breaking Machine-Learning
Biomarkers.” GigaScience 10 (9): giab055. https://doi.org/10.1093/gigascience/giab055.
Duncan, L., H. Shen, B. Gelaye, J. Meijsen, K. Ressler, M. Feldman, R.
Peterson, and B. Domingue. 2019. “Analysis of Polygenic Risk Score
Usage and Performance in Diverse Human Populations.” Nature
Communications 10 (1): 3328. https://doi.org/10.1038/s41467-019-11112-0.
Dwivedi, Vijay Prakash, and Xavier Bresson. 2021. “A
Generalization of Transformer
Networks to Graphs.” arXiv. https://doi.org/10.48550/arXiv.2012.09699.
Edgar, Ron, Michael Domrachev, and Alex E. Lash. 2002. “Gene
Expression Omnibus: NCBI Gene
Expression and Hybridization Array Data Repository.” Nucleic
Acids Research 30 (1): 207–10. https://doi.org/10.1093/nar/30.1.207.
Elgart, Michael, Genevieve Lyons, Santiago Romero-Brufau, Nuzulul
Kurniansyah, Jennifer A. Brody, Xiuqing Guo, Henry J. Lin, et al. 2022.
“Non-Linear Machine Learning Models Incorporating
SNPs and PRS Improve Polygenic Prediction in
Diverse Human Populations.” Communications Biology 5
(1): 856. https://doi.org/10.1038/s42003-022-03812-z.
Elks, Cathy E., Marcel Den Hoed, Jing Hua Zhao, Stephen J. Sharp,
Nicholas J. Wareham, Ruth J. F. Loos, and Ken K. Ong. 2012.
“Variability in the Heritability of Body
Mass Index: A
Systematic Review and
Meta-Regression.” Frontiers in
Endocrinology 3 (February). https://doi.org/10.3389/fendo.2012.00029.
Elnaggar, Ahmed, Michael Heinzinger, Christian Dallago, Ghalia Rihawi,
Yu Wang, Llion Jones, Tom Gibbs, et al. 2021.
“ProtTrans: Towards
Cracking the Language of Life’s
Code Through
Self-Supervised Deep
Learning and High Performance
Computing.” arXiv. https://doi.org/10.48550/arXiv.2007.06225.
Erlich, Yaniv, and Arvind Narayanan. 2014. “Routes for Breaching
and Protecting Genetic Privacy.” Nature Reviews Genetics
15 (6): 409–21. https://doi.org/10.1038/nrg3723.
Esposito, Daniel, Jochen Weile, Jay Shendure, Lea M. Starita, Anthony T.
Papenfuss, Frederick P. Roth, Douglas M. Fowler, and Alan F. Rubin.
2019. “MaveDB: An Open-Source Platform to Distribute
and Interpret Data from Multiplexed Assays of Variant Effect.”
Genome Biology 20 (1): 223. https://doi.org/10.1186/s13059-019-1845-6.
European Parliament. 2016. “Regulation on the Protection of
Natural Persons with Regard to the Processing of Personal Data and on
the Free Movement of Such Data.”
———. 2017. “Regulation on Medical Devices.”
———. 2024. “Regulation Laying down Harmonised Rules on Artificial
Intelligence.”
Fang, Yitian, Yi Jiang, Leyi Wei, Qin Ma, Zhixiang Ren, Qianmu Yuan, and
Dong-Qing Wei. 2023. “DeepProSite: Structure-Aware
Protein Binding Site Prediction Using ESMFold and
Pretrained Language Model.” Bioinformatics 39 (12):
btad718. https://doi.org/10.1093/bioinformatics/btad718.
Farh, Kyle Kai-How, Alexander Marson, Jiang Zhu, Markus Kleinewietfeld,
William J. Housley, Samantha Beik, Noam Shoresh, et al. 2015.
“Genetic and Epigenetic Fine Mapping of Causal Autoimmune Disease
Variants.” Nature 518 (7539): 337–43. https://doi.org/10.1038/nature13835.
Fedus, William, Barret Zoph, and Noam Shazeer. 2022. “Switch
Transformers: Scaling to Trillion
Parameter Models with Simple and
Efficient Sparsity.” Journal of
Machine Learning Research 23 (120): 1–39.
Ferruz, Noelia, Steffen Schmidt, and Birte Höcker. 2022.
“ProtGPT2 Is a Deep Unsupervised Language Model for
Protein Design.” Nature Communications 13 (1): 4348. https://doi.org/10.1038/s41467-022-32007-7.
Findlay, Gregory M., Riza M. Daza, Beth Martin, Melissa D. Zhang, Anh P.
Leith, Molly Gasperini, Joseph D. Janizek, Xingfan Huang, Lea M.
Starita, and Jay Shendure. 2018. “Accurate Classification of
BRCA1 Variants with Saturation Genome Editing.”
Nature 562 (7726): 217–22. https://doi.org/10.1038/s41586-018-0461-z.
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. 2017.
“Model-Agnostic
Meta-Learning for Fast
Adaptation of Deep
Networks.” In Proceedings of the 34th
International Conference on
Machine Learning, 1126–35. PMLR.
Fokkema, Ivo F. A. C., Peter E. M. Taschner, Gerard C. P. Schaafsma, J.
Celli, Jeroen F. J. Laros, and Johan T. den Dunnen. 2011.
“LOVD v.2.0: The Next Generation in Gene Variant
Databases.” Human Mutation 32 (5): 557–63. https://doi.org/10.1002/humu.21438.
Food and Drug Administration. 2023. “Using Artificial
Intelligence and Machine Learning
in the Development of Drug and
Biological Products;
Availability.”
———. 2024. “Laboratory Developed Tests:
Small Entity Compliance
Guide; Guidance for Laboratory
Manufacturers and Food and Drug
Administration Staff;
Availability.”
———. 2025. “Artificial
Intelligence-Enabled Medical
Devices.”
Fowler, Douglas M., and Stanley Fields. 2014. “Deep Mutational
Scanning: A New Style of Protein Science.” Nature
Methods 11 (8): 801–7. https://doi.org/10.1038/nmeth.3027.
Frankish, Adam, Mark Diekhans, Anne-Maud Ferreira, Rory Johnson, Irwin
Jungreis, Jane Loveland, Jonathan M Mudge, et al. 2019.
“GENCODE Reference Annotation for the Human and Mouse
Genomes.” Nucleic Acids Research 47 (D1): D766–73. https://doi.org/10.1093/nar/gky955.
Frazer, Jonathan, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K.
Min, Kelly Brock, Yarin Gal, and Debora S. Marks. 2021.
“[EVE] Disease Variant Prediction with
Deep Generative Models of Evolutionary Data.” Nature 599
(7883): 91–95. https://doi.org/10.1038/s41586-021-04043-8.
Friedman, Dan, and Adji Bousso Dieng. 2023. “The
Vendi Score: A
Diversity Evaluation Metric for
Machine Learning.” Transactions on
Machine Learning Research. https://openreview.net/forum?id=S7hJSmMM5l.
Fudenberg, Geoff, David R. Kelley, and Katherine S. Pollard. 2020.
“[Akita] Predicting 3D
Genome Folding from DNA Sequence with
Akita.” Nature Methods 17 (11): 1111–17. https://doi.org/10.1038/s41592-020-0958-x.
Gaedigk, Andrea, Magnus Ingelman-Sundberg, Neil A. Miller, J. Steven
Leeder, Michelle Whirl-Carrillo, Teri E. Klein, and the PharmVar
Steering Committee. 2017. “The Pharmacogene
Variation (PharmVar) Consortium:
Incorporation of the Human
Cytochrome P450 (CYP)
Allele Nomenclature
Database.” Clinical Pharmacology &
Therapeutics 103 (3): 399–401. https://doi.org/10.1002/cpt.910.
Gal, Yarin, and Zoubin Ghahramani. 2016. “Dropout as a
Bayesian Approximation:
Representing Model Uncertainty in
Deep Learning.” In Proceedings of
The 33rd International Conference
on Machine Learning, 1050–59. PMLR.
Gamazon, Eric R., Heather E. Wheeler, Kaanan P. Shah, Sahar V.
Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, et
al. 2015. “A Gene-Based Association Method for Mapping Traits
Using Reference Transcriptome Data.” Nature Genetics 47
(9): 1091–98. https://doi.org/10.1038/ng.3367.
Ganin, Yaroslav, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo
Larochelle, François Laviolette, Mario March, and Victor Lempitsky.
2016. “Domain-Adversarial Training of
Neural Networks.” Journal of
Machine Learning Research 17 (59): 1–35.
Gao, Hong, Tobias Hamp, Jeffrey Ede, Joshua G. Schraiber, Jeremy McRae,
Moriel Singer-Berk, Yanshen Yang, et al. 2023. “The Landscape of
Tolerated Genetic Variation in Humans and Primates.” Science
(New York, N.Y.) 380 (6648): eabn8153. https://doi.org/10.1126/science.abn8197.
Garrison, Erik, Jouni Sirén, Adam M. Novak, Glenn Hickey, Jordan M.
Eizenga, Eric T. Dawson, William Jones, et al. 2018. “Variation
Graph Toolkit Improves Read Mapping by Representing Genetic Variation in
the Reference.” Nature Biotechnology 36 (9): 875–79. https://doi.org/10.1038/nbt.4227.
Gasperini, Molly, Andrew J. Hill, José L. McFaline-Figueroa, Beth
Martin, Seungsoo Kim, Melissa D. Zhang, Dana Jackson, et al. 2019.
“A Genome-Wide Framework for
Mapping Gene Regulation via
Cellular Genetic Screens.”
Cell 176 (1): 377–390.e19. https://doi.org/10.1016/j.cell.2018.11.029.
Gayoso, Adam, Zoë Steier, Romain Lopez, Jeffrey Regier, Kristopher L.
Nazor, Aaron Streets, and Nir Yosef. 2021. “Joint Probabilistic
Modeling of Single-Cell Multi-Omic Data with totalVI.” Nature Methods 18 (3):
272–82. https://doi.org/10.1038/s41592-020-01050-x.
Ge, Tian, Chia-Yen Chen, Yang Ni, Yen-Chen Anne Feng, and Jordan W.
Smoller. 2019. “Polygenic Prediction via Bayesian
Regression and Continuous Shrinkage Priors.” Nature
Communications 10 (1): 1776. https://doi.org/10.1038/s41467-019-09718-5.
Gebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman
Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021.
“Datasheets for Datasets.” Commun. ACM 64 (12):
86–92. https://doi.org/10.1145/3458723.
Georgantas, Costa, Zoltán Kutalik, and Jonas Richiardi. 2024.
“Delphi: A Deep-Learning
Method for Polygenic Risk
Prediction.” medRxiv. https://doi.org/10.1101/2024.04.19.24306079.
Giambartolomei, Claudia, Damjan Vukcevic, Eric E. Schadt, Lude Franke,
Aroon D. Hingorani, Chris Wallace, and Vincent Plagnol. 2014.
“Bayesian Test for Colocalisation
Between Pairs of Genetic
Association Studies Using
Summary Statistics.” PLOS
Genetics 10 (5): e1004383. https://doi.org/10.1371/journal.pgen.1004383.
Gong, Li, Clarissa J Klein, Kelly E Caudle, Ann M Moyer, Stuart A Scott,
Michelle Whirl-Carrillo, Teri E Klein, ClinGen Pharmacogenomics Working
Group (PGxWG), and on behalf of the. 2025. “Integrating
Pharmacogenomics into the Broader
Construct of Genomic Medicine:
Efforts by the ClinGen
Pharmacogenomics Working Group
(PGxWG).” Clinical Chemistry 71 (1): 36–44.
https://doi.org/10.1093/clinchem/hvae181.
Goodwin, Sara, John D. McPherson, and W. Richard McCombie. 2016.
“Coming of Age: Ten Years of Next-Generation Sequencing
Technologies.” Nature Reviews Genetics 17 (6): 333–51.
https://doi.org/10.1038/nrg.2016.49.
Gordon, Derek, Stephen J. Finch, and Wonkuk Kim. 2020. Heterogeneity
in Statistical Genetics: How to Assess, Address, and Account for
Mixtures in Association Studies. Cham: Springer. https://doi.org/10.1007/978-3-030-61121-7.
Granger, C. W. J. 1969. “Investigating Causal
Relations by Econometric Models
and Cross-Spectral Methods.”
Econometrica 37 (3).
Grantham, R. 1974. “Amino Acid
Difference Formula to Help
Explain Protein
Evolution.” Science 185 (4154): 862–64. https://doi.org/10.1126/science.185.4154.862.
Grešová, Katarína, Vlastimil Martinek, David Čechák, Petr Šimeček, and
Panagiotis Alexiou. 2023. “Genomic Benchmarks: A Collection of
Datasets for Genomic Sequence Classification.” BMC Genomic
Data 24 (1): 25. https://doi.org/10.1186/s12863-023-01123-8.
Grimm, Dominik G., Chloé-Agathe Azencott, Fabian Aicheler, Udo Gieraths,
Daniel G. MacArthur, Kaitlin E. Samocha, David N. Cooper, et al. 2015.
“The Evaluation of Tools
Used to Predict the Impact of
Missense Variants Is
Hindered by Two Types of
Circularity.” Human Mutation 36 (5):
513–23. https://doi.org/10.1002/humu.22768.
GTEx Consortium, The. 2020. “The GTEx
Consortium Atlas of Genetic Regulatory Effects Across Human
Tissues.” Science 369 (6509): 1318–30. https://doi.org/10.1126/science.aaz1776.
Gu, Albert, and Tri Dao. 2024. “Mamba:
Linear-Time Sequence
Modeling with Selective State
Spaces.” In.
Gu, Albert, Karan Goel, and Christopher Ré. 2022. “Efficiently
Modeling Long Sequences with
Structured State Spaces.”
arXiv. https://doi.org/10.48550/arXiv.2111.00396.
Gu, Yu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong
Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021.
“Domain-Specific Language
Model Pretraining for Biomedical
Natural Language
Processing.” ACM Trans. Comput. Healthcare
3 (1): 2:1–23. https://doi.org/10.1145/3458754.
Gudbjartsson, Daniel F., Patrick Sulem, Hannes Helgason, Arnaldur
Gylfason, Sigurjon A. Gudjonsson, Florian Zink, Asmundur Oddson, et al.
2015. “Sequence Variants from Whole Genome Sequencing a Large
Group of Icelanders.” Scientific Data 2
(1): 150011. https://doi.org/10.1038/sdata.2015.11.
Guo, Chuan, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017.
“On Calibration of Modern
Neural Networks.” In Proceedings of
the 34th International Conference on
Machine Learning, 1321–30. PMLR.
Gusev, Alexander, Arthur Ko, Huwenbo Shi, Gaurav Bhatia, Wonil Chung,
Brenda W. J. H. Penninx, Rick Jansen, et al. 2016. “Integrative
Approaches for Large-Scale Transcriptome-Wide Association
Studies.” Nature Genetics 48 (3): 245–52. https://doi.org/10.1038/ng.3506.
Gymrek, Melissa, Amy L. McGuire, David Golan, Eran Halperin, and Yaniv
Erlich. 2013. “Identifying Personal
Genomes by Surname
Inference.” Science 339 (6117): 321–24. https://doi.org/10.1126/science.1229566.
Hamilton, William L., Rex Ying, and Jure Leskovec. 2017.
“[GraphSAGE] Inductive
Representation Learning on Large
Graphs.” arXiv.org.
Hansen, Ben B. 2008. “The Prognostic Analogue of the Propensity
Score.” Biometrika 95 (2): 481–88. https://doi.org/10.1093/biomet/asn004.
Hao, Minsheng, Jing Gong, Xin Zeng, Chiming Liu, Yucheng Guo, Xingyi
Cheng, Taifeng Wang, Jianzhu Ma, Xuegong Zhang, and Le Song. 2024.
“Large-Scale Foundation Model on Single-Cell
Transcriptomics.” Nature Methods 21 (8): 1481–91. https://doi.org/10.1038/s41592-024-02305-7.
Hart, G. Traver, Arun K. Ramani, and Edward M. Marcotte. 2006.
“How Complete Are Current Yeast and Human Protein-Interaction
Networks?” Genome Biology 7 (11): 120. https://doi.org/10.1186/gb-2006-7-11-120.
Hartwig, Fernando Pires, George Davey Smith, and Jack Bowden. 2017.
“Robust Inference in Summary Data Mendelian
Randomization via the Zero Modal Pleiotropy Assumption.”
International Journal of Epidemiology 46 (6): 1985–98. https://doi.org/10.1093/ije/dyx102.
Hayes, Thomas, Roshan Rao, Halil Akin, Nicholas J. Sofroniew, Deniz
Oktay, Zeming Lin, Robert Verkuil, et al. 2025.
“[ESM-3] Simulating 500 Million Years of
Evolution with a Language Model.” Science 387 (6736):
850–58. https://doi.org/10.1126/science.ads0018.
Health and Human Services. 2018. “Federal Policy for
the Protection of Human
Subjects.”
Henikoff, S, and J G Henikoff. 1992. “Amino Acid Substitution
Matrices from Protein Blocks.” Proceedings of the National
Academy of Sciences 89 (22): 10915–19. https://doi.org/10.1073/pnas.89.22.10915.
Hie, Brian L., Varun R. Shanker, Duo Xu, Theodora U. J. Bruun, Payton A.
Weidenbacher, Shaogeng Tang, Wesley Wu, John E. Pak, and Peter S. Kim.
2023. “Efficient Evolution of Human Antibodies from General
Protein Language Models.” Nature Biotechnology 42 (2):
275–83. https://doi.org/10.1038/s41587-023-01763-2.
Hilker, Rikke, Dorte Helenius, Birgitte Fagerlund, Axel Skytthe, Kaare
Christensen, Thomas M. Werge, Merete Nordentoft, and Birte Glenthøj.
2018. “Heritability of Schizophrenia and
Schizophrenia Spectrum Based on
the Nationwide Danish Twin
Register.” Biological Psychiatry, Novel
Mechanisms in Schizophrenia
Pathophysiology, 83 (6): 492–98. https://doi.org/10.1016/j.biopsych.2017.08.017.
Himmelstein, Daniel Scott, Antoine Lizee, Christine Hessler, Leo
Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian,
and Sergio E Baranzini. 2017. “Systematic Integration of
Biomedical Knowledge Prioritizes Drugs for Repurposing.” Edited
by Alfonso Valencia. eLife 6 (September): e26726. https://doi.org/10.7554/eLife.26726.
Hoang, Minh, and Mona Singh. 2025. “Locality-Aware Pooling
Enhances Protein Language Model Performance Across Varied
Applications.” Bioinformatics 41 (Supplement_1):
i217–26. https://doi.org/10.1093/bioinformatics/btaf178.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long
Short-Term Memory.”
Neural Computation 9 (8): 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.
Hoffmann, Jordan, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya,
Trevor Cai, Eliza Rutherford, Diego de Las Casas, et al. 2022.
“Training Compute-Optimal
Large Language Models.”
arXiv. https://doi.org/10.48550/arXiv.2203.15556.
Homer, Nils, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav
Tembe, Jill Muehling, John V. Pearson, Dietrich A. Stephan, Stanley F.
Nelson, and David W. Craig. 2008. “Resolving
Individuals Contributing Trace
Amounts of DNA to Highly
Complex Mixtures Using
High-Density SNP
Genotyping Microarrays.” PLOS
Genetics 4 (8): e1000167. https://doi.org/10.1371/journal.pgen.1000167.
Hormozdiari, Farhad, Emrah Kostem, Eun Yong kang, Bogdan Pasaniuc, and
Eleazar Eskin. 2014. “Identifying Causal Variants at Loci with
Multiple Signals of Association.” In Proceedings of the 5th
ACM Conference on Bioinformatics,
Computational Biology, and Health
Informatics, 610–11. BCB ’14. New York,
NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2649387.2660800.
Horvath, Steve. 2013. “DNA Methylation Age of Human
Tissues and Cell Types.” Genome Biology 14 (10): 3156.
https://doi.org/10.1186/gb-2013-14-10-r115.
Howard, Jeremy, and Sebastian Ruder. 2018. “Universal
Language Model Fine-Tuning for
Text Classification.” arXiv. https://doi.org/10.48550/arXiv.1801.06146.
Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S.
Hansen, Oliver J. Rando, Robert Tjian, and Xavier Darzacq. 2020.
“Resolving the 3D Landscape of
Transcription-Linked Mammalian
Chromatin Folding.” Molecular
Cell 78 (3): 539–553.e8. https://doi.org/10.1016/j.molcel.2020.03.002.
Hsu, Chloe, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu,
Adam Lerer, and Alexander Rives. 2022. “Learning Inverse Folding
from Millions of Predicted Structures.” In Proceedings of the
39th International Conference on
Machine Learning, 8946–70. PMLR.
Hu, Edward J., Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi
Li, Shean Wang, and Weizhu Chen. 2021. “LoRA:
Low-Rank Adaptation of
Large Language Models.”
arXiv. https://doi.org/10.48550/arXiv.2106.09685.
Huang, Po-Ssu, Scott E. Boyken, and David Baker. 2016. “The Coming
of Age of de Novo Protein Design.” Nature 537 (7620):
320–27. https://doi.org/10.1038/nature19946.
Hubisz, Melissa J, and Katherine S Pollard. 2014. “Exploring the
Genesis and Functions of Human Accelerated
Regions Sheds Light on Their Role in Human
Evolution.” Current Opinion in Genetics &
Development, Genetics of human evolution, 29 (December): 15–21. https://doi.org/10.1016/j.gde.2014.07.005.
Huynh-Thu, Vân Anh, Alexandre Irrthum, Louis Wehenkel, and Pierre
Geurts. 2010. “Inferring Regulatory
Networks from Expression Data
Using Tree-Based
Methods.” PLOS ONE 5 (9): e12776. https://doi.org/10.1371/journal.pone.0012776.
Hwang, Yunha, Andre L. Cornman, Elizabeth H. Kellogg, Sergey
Ovchinnikov, and Peter R. Girguis. 2024. “Genomic Language Model
Predicts Protein Co-Regulation and Function.” Nature
Communications 15 (1): 2880. https://doi.org/10.1038/s41467-024-46947-9.
Ingelman-Sundberg, M. 2004. “Genetic Polymorphisms of Cytochrome
P450 2D6 (CYP2D6): Clinical
Consequences, Evolutionary Aspects and Functional Diversity.”
The Pharmacogenomics Journal 5 (1): 6–13. https://doi.org/10.1038/sj.tpj.6500285.
Ingraham, John B., Max Baranov, Zak Costello, Karl W. Barber, Wujie
Wang, Ahmed Ismail, Vincent Frappier, et al. 2023. “Illuminating
Protein Space with a Programmable Generative Model.”
Nature 623 (7989): 1070–78. https://doi.org/10.1038/s41586-023-06728-8.
International Medical Device Regulators Forum. 2014. “Software as
a Medical Device: Possible
Framework for Risk Categorization
and Corresponding Considerations.”
———. 2017. “Software as a Medical Device
(SaMD): Clinical
Evaluation.”
Ioannidis, Nilah M., Joseph H. Rothstein, Vikas Pejaver, Sumit Middha,
Shannon K. McDonnell, Saurabh Baheti, Anthony Musolf, et al. 2016.
“REVEL: An Ensemble
Method for Predicting the
Pathogenicity of Rare Missense
Variants.” The American Journal of Human
Genetics 99 (4): 877–85. https://doi.org/10.1016/j.ajhg.2016.08.016.
Ionita-Laza, Iuliana, Kenneth McCallum, Bin Xu, and Joseph D. Buxbaum.
2016. “A Spectral Approach Integrating Functional Genomic
Annotations for Coding and Noncoding Variants.” Nature
Genetics 48 (2): 214–20. https://doi.org/10.1038/ng.3477.
Jagadeesh, Karthik A., Aaron M. Wenger, Mark J. Berger, Harendra Guturu,
Peter D. Stenson, David N. Cooper, Jonathan A. Bernstein, and Gill
Bejerano. 2016. “M-CAP Eliminates a Majority of
Variants of Uncertain Significance in Clinical Exomes at High
Sensitivity.” Nature Genetics 48 (12): 1581–86. https://doi.org/10.1038/ng.3703.
Jaganathan, Kishore, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F.
McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A.
Kosmicki, et al. 2019. “[SpliceAI]
Predicting Splicing from Primary
Sequence with Deep
Learning.” Cell 176 (3): 535–548.e24. https://doi.org/10.1016/j.cell.2018.12.015.
Jagota, Milind, Chengzhong Ye, Carlos Albors, Ruchir Rastogi, Antoine
Koehl, Nilah Ioannidis, and Yun S. Song. 2023. “Cross-Protein
Transfer Learning Substantially Improves Disease Variant
Prediction.” Genome Biology 24 (1): 182. https://doi.org/10.1186/s13059-023-03024-6.
Jain, Sarthak, and Byron C. Wallace. 2019. “Attention Is Not
Explanation.” arXiv. https://doi.org/10.48550/arXiv.1902.10186.
Jawahar, Ganesh, Benoît Sagot, and Djamé Seddah. 2019. “What Does
BERT Learn about the Structure of Language?” In
ACL 2019 - 57th Annual
Meeting of the Association for
Computational Linguistics. Florence,
Italy.
Ji, Yanrong, Zhihan Zhou, Han Liu, and Ramana V Davuluri. 2021.
“DNABERT: Pre-Trained Bidirectional
Encoder Representations from
Transformers Model for DNA-Language in
Genome.” Bioinformatics 37 (15): 2112–20. https://doi.org/10.1093/bioinformatics/btab083.
Jiang, Tao, Yongzhuang Liu, Yue Jiang, Junyi Li, Yan Gao, Zhe Cui,
Yadong Liu, Bo Liu, and Yadong Wang. 2020. “Long-Read-Based Human
Genomic Structural Variation Detection with cuteSV.” Genome Biology 21 (1):
189. https://doi.org/10.1186/s13059-020-02107-y.
Jr, Timothy F. Truong, and Tristan Bepler. 2023.
“PoET: A Generative Model of Protein
Families as Sequences-of-Sequences.” arXiv. https://doi.org/10.48550/arXiv.2306.06156.
Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael
Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021.
“[AlphaFold2] Highly Accurate Protein
Structure Prediction with AlphaFold.”
Nature 596 (7873): 583–89. https://doi.org/10.1038/s41586-021-03819-2.
Jurenaite, Neringa, Daniel León-Periñán, Veronika Donath, Sunna Torge,
and René Jäkel. 2024. “SetQuence &
SetOmic: Deep Set Transformers for Whole
Genome and Exome Tumour Analysis.” BioSystems 235
(January): 105095. https://doi.org/10.1016/j.biosystems.2023.105095.
Kagda, Meenakshi S., Bonita Lam, Casey Litton, Corinn Small, Cricket A.
Sloan, Emma Spragins, Forrest Tanaka, et al. 2025. “Data
Navigation on the ENCODE Portal.” Nature
Communications 16 (1): 9592. https://doi.org/10.1038/s41467-025-64343-9.
Kalvari, Ioanna, Eric P Nawrocki, Nancy Ontiveros-Palacios, Joanna
Argasinska, Kevin Lamkiewicz, Manja Marz, Sam Griffiths-Jones, et al.
2021. “Rfam 14: Expanded Coverage of Metagenomic, Viral and microRNA Families.” Nucleic Acids
Research 49 (D1): D192–200. https://doi.org/10.1093/nar/gkaa1047.
Kaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin
Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario
Amodei. 2020. “Scaling Laws for Neural
Language Models.” arXiv. https://doi.org/10.48550/arXiv.2001.08361.
Karczewski, Konrad J., Laurent C. Francioli, Grace Tiao, Beryl B.
Cummings, Jessica Alföldi, Qingbo Wang, Ryan L. Collins, et al. 2020.
“The Mutational Constraint Spectrum Quantified from Variation in
141,456 Humans.” Nature 581 (7809): 434–43. https://doi.org/10.1038/s41586-020-2308-7.
Karollus, Alexander, Thomas Mauermeier, and Julien Gagneur. 2023.
“Current Sequence-Based Models Capture Gene Expression
Determinants in Promoters but Mostly Ignore Distal Enhancers.”
Genome Biology 24 (1): 56. https://doi.org/10.1186/s13059-023-02899-9.
Katzman, Jared L., Uri Shaham, Alexander Cloninger, Jonathan Bates,
Tingting Jiang, and Yuval Kluger. 2018. “DeepSurv:
Personalized Treatment Recommender System Using a Cox
Proportional Hazards Deep Neural Network.” BMC Medical
Research Methodology 18 (1): 24. https://doi.org/10.1186/s12874-018-0482-1.
Kaye, Jane, Edgar A. Whitley, David Lund, Michael Morrison, Harriet
Teare, and Karen Melham. 2014. “Dynamic Consent: A Patient
Interface for Twenty-First Century Research Networks.”
European Journal of Human Genetics 23 (2): 141–46. https://doi.org/10.1038/ejhg.2014.71.
Kelley, David R. 2020. “[Basenji2]
Cross-Species Regulatory Sequence Activity
Prediction.” PLOS Computational Biology 16 (7):
e1008050. https://doi.org/10.1371/journal.pcbi.1008050.
Kelley, David R., Yakir A. Reshef, Maxwell Bileschi, David Belanger,
Cory Y. McLean, and Jasper Snoek. 2018. “[Basenji]
Sequential Regulatory Activity Prediction Across
Chromosomes with Convolutional Neural Networks.” Genome
Research 28 (5): 739–50. https://doi.org/10.1101/gr.227819.117.
Kelley, David R., Jasper Snoek, and John L. Rinn. 2016. “Basset:
Learning the Regulatory Code of the Accessible Genome with Deep
Convolutional Neural Networks.” Genome Research 26 (7):
990–99. https://doi.org/10.1101/gr.200535.115.
Khera, Amit V., and Sekar Kathiresan. 2017. “Genetics of Coronary
Artery Disease: Discovery, Biology and Clinical Translation.”
Nature Reviews Genetics 18 (6): 331–44. https://doi.org/10.1038/nrg.2016.160.
Kichaev, Gleb, Megan Roytman, Ruth Johnson, Eleazar Eskin, Sara
Lindström, Peter Kraft, and Bogdan Pasaniuc. 2017. “Improved
Methods for Multi-Trait Fine Mapping of Pleiotropic Risk Loci.”
Bioinformatics 33 (2): 248–55. https://doi.org/10.1093/bioinformatics/btw615.
Kipf, Thomas N., and Max Welling. 2017.
“Semi-Supervised Classification with
Graph Convolutional
Networks.” arXiv. https://doi.org/10.48550/arXiv.1609.02907.
Kircher, Martin, Daniela M. Witten, Preti Jain, Brian J. O’Roak, Gregory
M. Cooper, and Jay Shendure. 2014. “A General Framework for
Estimating the Relative Pathogenicity of Human Genetic Variants.”
Nature Genetics 46 (3): 310–15. https://doi.org/10.1038/ng.2892.
Kıcıman, Emre, Robert Ness, Amit Sharma, and Chenhao Tan. 2024.
“Causal Reasoning and Large
Language Models: Opening a
New Frontier for
Causality.” arXiv. https://doi.org/10.48550/arXiv.2305.00050.
Kong, Augustine, Michael L. Frigge, Gisli Masson, Soren Besenbacher,
Patrick Sulem, Gisli Magnusson, Sigurjon A. Gudjonsson, et al. 2012.
“Rate of de Novo Mutations and the Importance of Father’s Age to
Disease Risk.” Nature 488 (7412): 471–75. https://doi.org/10.1038/nature11396.
Krusche, Peter, Len Trigg, Paul C. Boutros, Christopher E. Mason,
Francisco M. De La Vega, Benjamin L. Moore, Mar Gonzalez-Porta, et al.
2019. “Best Practices for Benchmarking
Germline Small Variant
Calls in Human Genomes.”
Nature Biotechnology 37 (5): 555–60. https://doi.org/10.1038/s41587-019-0054-x.
Kryshtafovych, Andriy, Torsten Schwede, Maya Topf, Krzysztof Fidelis,
and John Moult. 2021. “Critical Assessment of Methods of Protein
Structure Prediction (CASP)—Round
XIV.” Proteins: Structure, Function, and
Bioinformatics 89 (12): 1607–17. https://doi.org/10.1002/prot.26237.
Kuchenbaecker, Karoline B., John L. Hopper, Daniel R. Barnes, Kelly-Anne
Phillips, Thea M. Mooij, Marie-José Roos-Blom, Sarah Jervis, et al.
2017. “Risks of Breast, Ovarian, and
Contralateral Breast Cancer for
BRCA1 and BRCA2 Mutation
Carriers.” JAMA 317 (23): 2402–16. https://doi.org/10.1001/jama.2017.7112.
Kulakovskiy, Ivan V., Ilya E. Vorontsov, Ivan S. Yevshin, Ruslan N.
Sharipov, Alla D. Fedorova, Eugene I. Rumynskiy, Yulia A. Medvedeva, et
al. 2018. “HOCOMOCO: Towards a Complete Collection of
Transcription Factor Binding Models for Human and Mouse via Large-Scale
ChIP-Seq Analysis.” Nucleic Acids
Research 46 (D1): D252–59. https://doi.org/10.1093/nar/gkx1106.
Kulmanov, Maxat, Francisco J. Guzmán-Vega, Paula Duek Roggli, Lydie
Lane, Stefan T. Arold, and Robert Hoehndorf. 2024. “Protein
Function Prediction as Approximate Semantic Entailment.”
Nature Machine Intelligence 6 (2): 220–28. https://doi.org/10.1038/s42256-024-00795-w.
Kundaje, Anshul, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela
Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, et al. 2015.
“Integrative Analysis of 111 Reference Human Epigenomes.”
Nature 518 (7539): 317–30. https://doi.org/10.1038/nature14248.
Kurki, Mitja I., Juha Karjalainen, Priit Palta, Timo P. Sipilä, Kati
Kristiansson, Kati M. Donner, Mary P. Reeve, et al. 2023.
“FinnGen Provides Genetic Insights from a
Well-Phenotyped Isolated Population.” Nature 613 (7944):
508–18. https://doi.org/10.1038/s41586-022-05473-8.
La Manno, Gioele, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah
Hochgerner, Viktor Petukhov, Katja Lidschreiber, et al. 2018.
“RNA Velocity of Single Cells.”
Nature 560 (7719): 494–98. https://doi.org/10.1038/s41586-018-0414-6.
Laird, Nan M., and Christoph Lange. 2011. The Fundamentals of Modern
Statistical Genetics. New York: Springer. https://doi.org/10.1007/978-1-4419-7338-2.
Lambert, Samuel A, Gad Abraham, and Michael Inouye. 2019. “Towards
Clinical Utility of Polygenic Risk Scores.” Human Molecular
Genetics 28 (R2): R133–42. https://doi.org/10.1093/hmg/ddz187.
Lambert, Samuel A., Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu,
Annalisa Buniello, Aoife McMahon, et al. 2021. “The
Polygenic Score Catalog as an
Open Database for Reproducibility and Systematic Evaluation.”
Nature Genetics 53 (4): 420–25. https://doi.org/10.1038/s41588-021-00783-5.
Landrum, Melissa J, Jennifer M Lee, Mark Benson, Garth R Brown, Chen
Chao, Shanmuga Chitipiralla, Baoshan Gu, et al. 2018.
“ClinVar: Improving Access to Variant Interpretations
and Supporting Evidence.” Nucleic Acids Research 46
(D1): D1062–67. https://doi.org/10.1093/nar/gkx1153.
Larson, Adam G., Daniel Elnatan, Madeline M. Keenen, Michael J. Trnka,
Jonathan B. Johnston, Alma L. Burlingame, David A. Agard, Sy Redding,
and Geeta J. Narlikar. 2017. “Liquid Droplet Formation by
HP1α Suggests a Role for Phase Separation in
Heterochromatin.” Nature 547 (7662): 236–40. https://doi.org/10.1038/nature22822.
Lawlor, Debbie A., Roger M. Harbord, Jonathan A. C. Sterne, Nic Timpson,
and George Davey Smith. 2008. “Mendelian Randomization:
Using Genes as Instruments for Making Causal Inferences in
Epidemiology.” Statistics in Medicine 27 (8): 1133–63.
https://doi.org/10.1002/sim.3034.
Leacy, Finbarr P., and Elizabeth A. Stuart. 2013. “On the Joint
Use of Propensity and Prognostic Scores in Estimation of the Average
Treatment Effect on the Treated: A Simulation Study.”
Statistics in Medicine 33 (20): 3488–508. https://doi.org/10.1002/sim.6030.
Lee, Ingoo, Zachary S. Wallace, Yuqi Wang, Sungjoon Park, Hojung Nam,
Amit R. Majithia, and Trey Ideker. 2025. “[G2PT]
A Genotype-Phenotype Transformer to Assess and Explain
Polygenic Risk.” bioRxiv. https://doi.org/10.1101/2024.10.23.619940.
Lee, Jinhyuk, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan
Ho So, and Jaewoo Kang. 2019. “BioBERT: A Pre-Trained
Biomedical Language Representation Model for Biomedical Text
Mining.” Bioinformatics 36 (4): 1234–40. https://doi.org/10.1093/bioinformatics/btz682.
Li, Heng. 2013. “Aligning Sequence Reads, Clone Sequences and
Assembly Contigs with BWA-MEM.” arXiv.
https://doi.org/10.48550/arXiv.1303.3997.
———. 2014. “Towards Better Understanding
of Artifacts in Variant Calling
from High-Coverage
Samples.” Bioinformatics 30 (20): 2843–51.
https://doi.org/10.1093/bioinformatics/btu356.
———. 2018. “Minimap2: Pairwise Alignment for Nucleotide
Sequences.” Bioinformatics 34 (18): 3094–3100. https://doi.org/10.1093/bioinformatics/bty191.
Li, Qimai, Zhichao Han, and Xiao-Ming Wu. 2018. “Deeper Insights
into Graph Convolutional Networks for Semi-Supervised Learning.”
In Proceedings of the Thirty-Second
AAAI Conference on Artificial
Intelligence and Thirtieth
Innovative Applications of
Artificial Intelligence
Conference and Eighth AAAI
Symposium on Educational Advances
in Artificial Intelligence, 3538–45.
AAAI’18/IAAI’18/EAAI’18. New
Orleans, Louisiana, USA: AAAI Press.
Li, Sizhen, Saeed Moayedpour, Ruijiang Li, Michael Bailey, Saleh Riahi,
Milad Miladi, Jacob Miner, et al. 2023. “CodonBERT:
Large Language Models for mRNA Design and Optimization.” bioRxiv. https://doi.org/10.1101/2023.09.09.556981.
Li, Weizhong, and Adam Godzik. 2006. “Cd-Hit: A Fast Program for
Clustering and Comparing Large Sets of Protein or Nucleotide
Sequences.” Bioinformatics 22 (13): 1658–59. https://doi.org/10.1093/bioinformatics/btl158.
Li, Xiao, Jie Ma, Ling Leng, Mingfei Han, Mansheng Li, Fuchu He, and
Yunping Zhu. 2022. “MoGCN: A
Multi-Omics Integration
Method Based on Graph
Convolutional Network for Cancer
Subtype Analysis.” Frontiers in
Genetics 13 (February). https://doi.org/10.3389/fgene.2022.806842.
Li, Zehui, Vallijah Subasri, Yifei Shen, Dongsheng Li, Yiren Zhao,
Guy-Bart Stan, and Caihua Shan. 2025. “Omni-DNA:
A Unified Genomic
Foundation Model for
Cross-Modal and
Multi-Task Learning.”
arXiv. https://doi.org/10.48550/arXiv.2502.03499.
Liao, Wen-Wei, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness,
Glenn Hickey, Shuangjia Lu, et al. 2023. “A Draft Human Pangenome
Reference.” Nature 617 (7960): 312–24. https://doi.org/10.1038/s41586-023-05896-x.
Lieberman-Aiden, Erez, Nynke L. van Berkum, Louise Williams, Noam
Kaplan, Peter J. Sabo, Michael O. Dorschner, Job Dekker, et al. 2009.
“Comprehensive Mapping of Long-Range Interactions Reveals Folding
Principles of the Human Genome.” Science 326 (5950):
289–93. https://doi.org/10.1126/science.1181369.
Lin, Tsung-Yi, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár.
2020. “Focal Loss for Dense
Object Detection.” IEEE
Transactions on Pattern Analysis and Machine Intelligence 42 (02):
318–27. https://doi.org/10.1109/TPAMI.2018.2858826.
Lin, Weining, David Miller, Zhonghui Gu, and Christine Orengo. 2025.
“GOBeacon: An Ensemble Model for Protein
Function Prediction Enhanced by Contrastive Learning.”
Protein Science 34 (7): e70182. https://doi.org/10.1002/pro.70182.
Lin, Zeming, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting
Lu, Allan dos Santos Costa, et al. 2022. “[ESM-2]
Language Models of Protein Sequences at the Scale of
Evolution Enable Accurate Structure Prediction.” bioRxiv. https://doi.org/10.1101/2022.07.20.500902.
Lin, Zeming, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting
Lu, Nikita Smetanin, et al. 2023. “Evolutionary-Scale Prediction
of Atomic-Level Protein Structure with a Language Model.”
Science 379 (6637): 1123–30. https://doi.org/10.1126/science.ade2574.
Linder, Johannes, Divyanshi Srivastava, Han Yuan, Vikram Agarwal, and
David R. Kelley. 2025. “[Borzoi]
Predicting RNA-Seq Coverage from
DNA Sequence as a Unifying Model of Gene
Regulation.” Nature Genetics 57 (4): 949–61. https://doi.org/10.1038/s41588-024-02053-6.
Lipsitch, Marc, Eric Tchetgen Tchetgen, and Ted Cohen. 2010.
“Negative Controls: A Tool
for Detecting Confounding and
Bias in Observational
Studies.” Epidemiology 21 (3): 383. https://doi.org/10.1097/EDE.0b013e3181d61eeb.
Liu, Zicheng, Siyuan Li, Zhiyuan Chen, Fang Wu, Chang Yu, Qirong Yang,
Yucheng Guo, Yujie Yang, Xiaoming Zhang, and Stan Z. Li. 2025.
“Life-Code: Central Dogma
Modeling with Multi-Omics
Sequence Unification.” arXiv. https://doi.org/10.48550/arXiv.2502.07299.
Logsdon, Glennis A., Mitchell R. Vollger, and Evan E. Eichler. 2020.
“Long-Read Human Genome Sequencing and Its Applications.”
Nature Reviews Genetics 21 (10): 597–614. https://doi.org/10.1038/s41576-020-0236-x.
Loh, Po-Ru, Petr Danecek, Pier Francesco Palamara, Christian
Fuchsberger, Yakir A Reshef, Hilary K Finucane, Sebastian Schoenherr, et
al. 2016. “Reference-Based Phasing Using the
Haplotype Reference Consortium
Panel.” Nature Genetics 48 (11): 1443–48. https://doi.org/10.1038/ng.3679.
Loshchilov, Ilya, and Frank Hutter. 2019. “Decoupled
Weight Decay
Regularization.” arXiv. https://doi.org/10.48550/arXiv.1711.05101.
Lupiáñez, Darío G., Katerina Kraft, Verena Heinrich, Peter Krawitz,
Francesco Brancati, Eva Klopocki, Denise Horn, et al. 2015.
“Disruptions of Topological Chromatin
Domains Cause Pathogenic
Rewiring of Gene-Enhancer
Interactions.” Cell 161 (5): 1012–25. https://doi.org/10.1016/j.cell.2015.04.004.
Lynch, Thomas J., Daphne W. Bell, Raffaella Sordella, Sarada
Gurubhagavatula, Ross A. Okimoto, Brian W. Brannigan, Patricia L.
Harris, et al. 2004. “Activating Mutations in the
Epidermal Growth Factor
Receptor Underlying
Responsiveness of
Non–Small-Cell Lung
Cancer to Gefitinib.” New England
Journal of Medicine 350 (21): 2129–39. https://doi.org/10.1056/NEJMoa040938.
Madani, Ali, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P.
Mohr, James M. Holton, Jose Luis Olmos, et al. 2023. “Large
Language Models Generate Functional Protein Sequences Across Diverse
Families.” Nature Biotechnology 41 (8): 1099–1106. https://doi.org/10.1038/s41587-022-01618-2.
Madhu, Hiren, João Felipe Rocha, Tinglin Huang, Siddharth Viswanath,
Smita Krishnaswamy, and Rex Ying. 2025. “HEIST:
A Graph Foundation
Model for Spatial Transcriptomics
and Proteomics Data.” arXiv. https://doi.org/10.48550/arXiv.2506.11152.
Mallal, Simon, Elizabeth Phillips, Giampiero Carosi, Jean-Michel Molina,
Cassy Workman, Janez Tomažič, Eva Jägel-Guedes, et al. 2008.
“HLA-B*5701 Screening for
Hypersensitivity to Abacavir.” New
England Journal of Medicine 358 (6): 568–79. https://doi.org/10.1056/NEJMoa0706135.
Maller, Julian B., Gilean McVean, Jake Byrnes, Damjan Vukcevic, Kimmo
Palin, Zhan Su, Joanna M. M. Howson, et al. 2012. “Bayesian
Refinement of Association Signals for 14 Loci in 3 Common
Diseases.” Nature Genetics 44 (12): 1294–1301. https://doi.org/10.1038/ng.2435.
Manolio, Teri A., Francis S. Collins, Nancy J. Cox, David B. Goldstein,
Lucia A. Hindorff, David J. Hunter, Mark I. McCarthy, et al. 2009.
“Finding the Missing Heritability of Complex Diseases.”
Nature 461 (7265): 747–53. https://doi.org/10.1038/nature08494.
Manzo, Gaetano, Kathryn Borkowski, and Ivan Ovcharenko. 2025.
“Comparative Analysis of Deep
Learning Models for Predicting
Causative Regulatory
Variants.” bioRxiv: The Preprint Server for
Biology, June, 2025.05.19.654920. https://doi.org/10.1101/2025.05.19.654920.
Marees, Andries T., Hilde de Kluiver, Sven Stringer, Florence Vorspan,
Emmanuel Curis, Cynthia Marie-Claire, and Eske M. Derks. 2018.
“[GWAS] A Tutorial on Conducting
Genome-Wide Association Studies: Quality Control and
Statistical Analysis.” International Journal of Methods in
Psychiatric Research 27 (2): e1608. https://doi.org/10.1002/mpr.1608.
Marin, Frederikke Isa, Felix Teufel, Marc Horlacher, Dennis Madsen,
Dennis Pultz, Ole Winther, and Wouter Boomsma. 2024.
“BEND: Benchmarking DNA
Language Models on Biologically Meaningful
Tasks.” arXiv. https://doi.org/10.48550/arXiv.2311.12570.
Márquez-Luna, Carla, Po-Ru Loh, South Asian Type 2 Diabetes (SAT2D)
Consortium, The SIGMA Type 2 Diabetes Consortium, and Alkes L. Price.
2017. “Multiethnic Polygenic Risk Scores Improve Risk Prediction
in Diverse Populations.” Genetic Epidemiology 41 (8):
811–23. https://doi.org/10.1002/gepi.22083.
Martin, Alicia R., Masahiro Kanai, Yoichiro Kamatani, Yukinori Okada,
Benjamin M. Neale, and Mark J. Daly. 2019. “Clinical Use of
Current Polygenic Risk Scores May Exacerbate Health Disparities.”
Nature Genetics 51 (4): 584–91. https://doi.org/10.1038/s41588-019-0379-x.
Mastropietro, Andrea, Gianluca De Carlo, and Aris Anagnostopoulos. 2023.
“XGDAG: Explainable Gene–Disease Associations via
Graph Neural Networks.” Bioinformatics 39 (8): btad482.
https://doi.org/10.1093/bioinformatics/btad482.
Maurano, Matthew T., Richard Humbert, Eric Rynes, Robert E. Thurman,
Eric Haugen, Hao Wang, Alex P. Reynolds, et al. 2012. “Systematic
Localization of Common
Disease-Associated Variation in
Regulatory DNA.” Science 337
(6099): 1190–95. https://doi.org/10.1126/science.1222794.
Mavaddat, Nasim, Kyriaki Michailidou, Joe Dennis, Michael Lush, Laura
Fachal, Andrew Lee, Jonathan P. Tyrer, et al. 2019. “Polygenic
Risk Scores for Prediction of
Breast Cancer and Breast
Cancer Subtypes.” The American
Journal of Human Genetics 104 (1): 21–34. https://doi.org/10.1016/j.ajhg.2018.11.002.
McCloskey, Michael, and Neal Cohen. 1989. “Catastrophic
Interference in Connectionist
Networks: The Sequential
Learning Problem.” Psychology of
Learning and Motivation 24 (January): 109–65. https://doi.org/10.1016/S0079-7421(08)60536-8.
McElreath, Richard. 2020. Statistical Rethinking:
A Bayesian Course with
Examples in R and Stan. 2nd
ed. Chapman; Hall/CRC.
Medvedev, Aleksandr, Karthik Viswanathan, Praveenkumar Kanithi, Kirill
Vishniakov, Prateek Munjal, Clément Christophe, Marco AF Pimentel,
Ronnie Rajan, and Shadab Khan. 2025. “BioToken and
BioFM – Biologically-Informed
Tokenization Enables Accurate and
Efficient Genomic Foundation
Models.” bioRxiv. https://doi.org/10.1101/2025.03.27.645711.
Meier, Joshua, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and
Alexander Rives. 2021. “[ESM-1v]
Language Models Enable Zero-Shot Prediction of the Effects
of Mutations on Protein Function.” bioRxiv. https://doi.org/10.1101/2021.07.09.450648.
Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy
Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and
Timnit Gebru. 2019. “Model Cards for
Model Reporting.” In Proceedings of
the Conference on Fairness,
Accountability, and Transparency, 220–29.
FAT* ’19. New York, NY, USA: Association for Computing
Machinery. https://doi.org/10.1145/3287560.3287596.
Morales, Joannella, Shashikant Pujar, Jane E. Loveland, Alex Astashyn,
Ruth Bennett, Andrew Berry, Eric Cox, et al. 2022. “A Joint
NCBI and EMBL-EBI Transcript Set
for Clinical Genomics and Research.” Nature 604 (7905):
310–15. https://doi.org/10.1038/s41586-022-04558-8.
Morcos, Faruck, Andrea Pagnani, Bryan Lunt, Arianna Bertolino, Debora S.
Marks, Chris Sander, Riccardo Zecchina, José N. Onuchic, Terence Hwa,
and Martin Weigt. 2011. “Direct-Coupling Analysis of Residue
Coevolution Captures Native Contacts Across Many Protein
Families.” Proceedings of the National Academy of
Sciences 108 (49): E1293–1301. https://doi.org/10.1073/pnas.1111471108.
Morris, Christopher, Martin Ritzert, Matthias Fey, William L. Hamilton,
Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. 2019.
“Weisfeiler and Leman Go Neural: Higher-Order Graph Neural
Networks.” In Proceedings of the
Thirty-Third AAAI
Conference on Artificial
Intelligence and Thirty-First
Innovative Applications of
Artificial Intelligence
Conference and Ninth AAAI
Symposium on Educational Advances
in Artificial Intelligence, 33:4602–9.
AAAI’19/IAAI’19/EAAI’19.
Honolulu, Hawaii, USA: AAAI Press. https://doi.org/10.1609/aaai.v33i01.33014602.
Mountjoy, Edward, Ellen M. Schmidt, Miguel Carmona, Jeremy
Schwartzentruber, Gareth Peat, Alfredo Miranda, Luca Fumis, et al. 2021.
“An Open Approach to Systematically Prioritize Causal Variants and
Genes at All Published Human GWAS Trait-Associated
Loci.” Nature Genetics 53 (11): 1527–33. https://doi.org/10.1038/s41588-021-00945-5.
Mukherjee, Sumit, Zachary R. McCaw, Jingwen Pei, Anna Merkoulovitch, Tom
Soare, Raghav Tandon, David Amar, et al. 2024.
“EmbedGEM: A Framework to Evaluate the Utility of
Embeddings for Genetic Discovery.” Bioinformatics
Advances 4 (1). https://doi.org/10.1093/bioadv/vbae135.
NaderiAlizadeh, Navid, and Rohit Singh. 2025. “Aggregating
Residue-Level Protein Language Model Embeddings with Optimal
Transport.” Bioinformatics Advances 5 (1): vbaf060. https://doi.org/10.1093/bioadv/vbaf060.
Naghipourfar, Mohsen, Siyu Chen, Mathew K. Howard, Christian B.
Macdonald, Ali Saberi, Timo Hagen, Mohammad R. K. Mofrad, Willow
Coyote-Maestas, and Hani Goodarzi. 2024. “[cdsFM - EnCodon/DeCodon]
A Suite of Foundation
Models Captures the Contextual
Interplay Between Codons.”
bioRxiv. https://doi.org/10.1101/2024.10.10.617568.
Nagpal, Chirag, Xinyu Li, and Artur Dubrawski. 2021. “Deep
Survival Machines: Fully
Parametric Survival Regression
and Representation Learning for
Censored Data With
Competing Risks.” IEEE Journal of
Biomedical and Health Informatics 25 (8): 3163–75. https://doi.org/10.1109/JBHI.2021.3052441.
Nelson, Matthew R., Hannah Tipney, Jeffery L. Painter, Judong Shen,
Paola Nicoletti, Yufeng Shen, Aris Floratos, et al. 2015. “The
Support of Human Genetic Evidence for Approved Drug Indications.”
Nature Genetics 47 (8): 856–60. https://doi.org/10.1038/ng.3314.
Ng, Pauline C., and Steven Henikoff. 2003. “SIFT:
Predicting Amino Acid Changes That Affect Protein
Function.” Nucleic Acids Research 31 (13): 3812–14. https://doi.org/10.1093/nar/gkg509.
Nguengang Wakap, Stéphanie, Deborah M. Lambert, Annie Olry, Charlotte
Rodwell, Charlotte Gueydan, Valérie Lanneau, Daniel Murphy, Yann Le Cam,
and Ana Rath. 2019. “Estimating Cumulative Point Prevalence of
Rare Diseases: Analysis of the Orphanet Database.”
European Journal of Human Genetics 28 (2): 165–73. https://doi.org/10.1038/s41431-019-0508-0.
Nguyen, Eric, Michael Poli, Matthew G. Durrant, Brian Kang, Dhruva
Katrekar, David B. Li, Liam J. Bartie, et al. 2024. “Sequence
Modeling and Design from Molecular to Genome Scale with
Evo.” Science 386 (6723): eado9336. https://doi.org/10.1126/science.ado9336.
Nguyen, Eric, Michael Poli, Marjan Faizi, Armin Thomas, Callum
Birch-Sykes, Michael Wornow, Aman Patel, et al. 2023.
“HyenaDNA: Long-Range
Genomic Sequence Modeling at
Single Nucleotide
Resolution.” arXiv. https://doi.org/10.48550/arXiv.2306.15794.
Nielsen, Rasmus, Joshua S. Paul, Anders Albrechtsen, and Yun S. Song.
2011. “Genotype and SNP Calling from Next-Generation
Sequencing Data.” Nature Reviews. Genetics 12 (6):
443–51. https://doi.org/10.1038/nrg2986.
Nijkamp, Erik, Jeffrey A. Ruffolo, Eli N. Weinstein, Nikhil Naik, and
Ali Madani. 2023. “ProGen2: Exploring
the Boundaries of Protein Language Models.” Cell Systems
14 (11): 968–978.e3. https://doi.org/10.1016/j.cels.2023.10.002.
Nofziger, Charity, Amy J. Turner, Katrin Sangkuhl, Michelle
Whirl-Carrillo, José A. G. Agúndez, John L. Black, Henry M.
Dunnenberger, et al. 2019. “PharmVar
GeneFocus: CYP2D6.” Clinical
Pharmacology & Therapeutics 107 (1): 154–70. https://doi.org/10.1002/cpt.1643.
Notin, Pascal, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado,
Aidan Gomez, Debora S. Marks, and Yarin Gal. 2022. “Tranception:
Protein Fitness Prediction with Autoregressive Transformers and
Inference-Time Retrieval.” arXiv. https://doi.org/10.48550/arXiv.2205.13760.
Notin, Pascal, Aaron Kollasch, Daniel Ritter, Lood van Niekerk,
Steffanie Paul, Han Spinner, Nathan Rollins, et al. 2023.
“ProteinGym: Large-Scale
Benchmarks for Protein Fitness
Prediction and Design.” Advances in
Neural Information Processing Systems 36 (December): 64331–79.
Nurk, Sergey, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V.
Bzikadze, Alla Mikheenko, Mitchell R. Vollger, et al. 2022. “The
Complete Sequence of a Human Genome.” Science 376
(6588): 44–53. https://doi.org/10.1126/science.abj6987.
O’Connell, Jared, Deepti Gurdasani, Olivier Delaneau, Nicola Pirastu,
Sheila Ulivi, Massimiliano Cocca, Michela Traglia, et al. 2014. “A
General Approach for Haplotype
Phasing Across the Full Spectrum
of Relatedness.” PLOS Genetics 10 (4):
e1004234. https://doi.org/10.1371/journal.pgen.1004234.
O’Leary, Nuala A., Mathew W. Wright, J. Rodney Brister, Stacy Ciufo,
Diana Haddad, Rich McVeigh, Bhanu Rajput, et al. 2016. “Reference
Sequence (RefSeq) Database at NCBI: Current
Status, Taxonomic Expansion, and Functional Annotation.”
Nucleic Acids Research 44 (D1): D733–45. https://doi.org/10.1093/nar/gkv1189.
Ochoa, David, Andrew Hercules, Miguel Carmona, Daniel Suveges, James
Baker, Cinzia Malangone, Irene Lopez, et al. 2023. “The
Next-Generation Open Targets
Platform: Reimagined, Redesigned, Rebuilt.”
Nucleic Acids Research 51 (D1): D1353–59. https://doi.org/10.1093/nar/gkac1046.
Oono, Kenta, and Taiji Suzuki. 2020. “Graph Neural
Networks Exponentially Lose
Expressive Power for Node
Classification.” In.
Oord, Aaron van den, Yazhe Li, and Oriol Vinyals. 2019.
“Representation Learning with
Contrastive Predictive
Coding.” arXiv. https://doi.org/10.48550/arXiv.1807.03748.
Orchard, Sandra, Mais Ammari, Bruno Aranda, Lionel Breuza, Leonardo
Briganti, Fiona Broackes-Carter, Nancy H. Campbell, et al. 2014.
“The MIntAct Project—IntAct as a Common
Curation Platform for 11 Molecular Interaction Databases.”
Nucleic Acids Research 42 (D1): D358–63. https://doi.org/10.1093/nar/gkt1115.
Orenbuch, Rose, Courtney A. Shearer, Aaron W. Kollasch, Aviv D. Spinner,
Thomas Hopf, Lood van Niekerk, Dinko Franceschi, Mafalda Dias, Jonathan
Frazer, and Debora S. Marks. 2025. “[popEVE] Proteome-Wide Model for Human
Disease Genetics.” Nature Genetics, November, 1–10. https://doi.org/10.1038/s41588-025-02400-1.
Oughtred, Rose, Jennifer Rust, Christie Chang, Bobby-Joe Breitkreutz,
Chris Stark, Andrew Willems, Lorrie Boucher, et al. 2020. “The
BioGRID Database: A Comprehensive Biomedical
Resource of Curated Protein, Genetic, and Chemical Interactions.”
Protein Science 30 (1): 187–200. https://doi.org/10.1002/pro.3978.
Outeiral, Carlos, and Charlotte M. Deane. 2024. “Codon Language
Embeddings Provide Strong Signals for Use in Protein
Engineering.” Nature Machine Intelligence 6 (2): 170–79.
https://doi.org/10.1038/s42256-024-00791-0.
Paass, Gerhard, and Sven Giesselbach. 2023. Foundation Models for
Natural Language Processing: Pre-Trained Language Models Integrating
Media. Cham: Springer. https://doi.org/10.1007/978-3-031-23190-2.
Parasuraman, Raja, and Dietrich H. Manzey. 2010. “Complacency and
Bias in Human Use of
Automation: An Attentional
Integration.” Human Factors 52 (3):
381–410. https://doi.org/10.1177/0018720810376055.
Patterson, Nick, Alkes L. Price, and David Reich. 2006.
“Population Structure and
Eigenanalysis.” PLOS Genetics 2 (12): e190.
https://doi.org/10.1371/journal.pgen.0020190.
Pe’er, Itsik, Roman Yelensky, David Altshuler, and Mark J. Daly. 2008.
“Estimation of the Multiple Testing Burden for Genomewide
Association Studies of Nearly All Common Variants.” Genetic
Epidemiology 32 (4): 381–85. https://doi.org/10.1002/gepi.20303.
Pearce, James D., Sara E. Simmonds, Gita Mahmoudabadi, Lakshmi Krishnan,
Giovanni Palla, Ana-Maria Istrate, Alexander Tarashansky, et al. 2025.
“[TranscriptFormer]
Cross-Species Generative
Cell Atlas Across 1.5
Billion Years of Evolution:
The TranscriptFormer Single-Cell
Model.” bioRxiv. https://doi.org/10.1101/2025.04.25.650731.
Pearl, Judea. 2009. Causality. Cambridge University Press.
Pearl, Judea, and Dana Mackenzie. 2018. The Book of
Why. Hachette Book Group.
Pejaver, Vikas, Alicia B. Byrne, Bing-Jian Feng, Kymberleigh A. Pagel,
Sean D. Mooney, Rachel Karchin, Anne O’Donnell-Luria, et al. 2022.
“Calibration of Computational Tools for Missense Variant
Pathogenicity Classification and ClinGen Recommendations
for PP3/BP4 Criteria.” American
Journal of Human Genetics 109 (12): 2163–77. https://doi.org/10.1016/j.ajhg.2022.10.013.
Peters, Matthew E., Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher
Clark, Kenton Lee, and Luke Zettlemoyer. 2018. “Deep
Contextualized Word
Representations.” In Proceedings of the 2018
Conference of the North American
Chapter of the Association for
Computational Linguistics: Human
Language Technologies, Volume 1
(Long Papers), edited by Marilyn Walker,
Heng Ji, and Amanda Stent, 2227–37. New Orleans, Louisiana: Association
for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202.
Piñero, Janet, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch,
Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I Furlong.
2020. “The DisGeNET Knowledge Platform for Disease
Genomics: 2019 Update.” Nucleic Acids Research 48 (D1):
D845–55. https://doi.org/10.1093/nar/gkz1021.
Platt, John. 1999. “Probabilistic Outputs for
Support Vector Machines and
Comparisons to Regularized
Likelihood Methods.” Advances in
Large Margin Classifiers, March.
Poli, Michael, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao,
Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Re. 2023.
“Hyena Hierarchy: Towards
Larger Convolutional Language
Models.” In Proceedings of the 40th
International Conference on
Machine Learning, 28043–78. PMLR.
Pollard, Katherine S., Melissa J. Hubisz, Kate R. Rosenbloom, and Adam
Siepel. 2009. “Detection of Nonneutral Substitution Rates on
Mammalian Phylogenies.” Genome Research 20 (1): 110–21.
https://doi.org/10.1101/gr.097857.109.
Poplin, Ryan, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas
Colthurst, Alexander Ku, Dan Newburger, et al. 2018.
“[DeepVariant] A Universal
SNP and Small-Indel Variant Caller Using Deep Neural
Networks.” Nature Biotechnology 36 (10): 983–87. https://doi.org/10.1038/nbt.4235.
Press, Ofir, Noah A. Smith, and Mike Lewis. 2022. “Train
Short, Test Long:
Attention with Linear Biases
Enables Input Length
Extrapolation.” arXiv. https://doi.org/10.48550/arXiv.2108.12409.
Price, Alkes L., Nick J. Patterson, Robert M. Plenge, Michael E.
Weinblatt, Nancy A. Shadick, and David Reich. 2006. “Principal
Components Analysis Corrects for Stratification in Genome-Wide
Association Studies.” Nature Genetics 38 (8): 904–9. https://doi.org/10.1038/ng1847.
Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel A.
R. Ferreira, David Bender, Julian Maller, et al. 2007.
“PLINK: A Tool
Set for Whole-Genome
Association and Population-Based
Linkage Analyses.” The American
Journal of Human Genetics 81 (3): 559–75. https://doi.org/10.1086/519795.
Quan, Hude, Vijaya Sundararajan, Patricia Halfon, Andrew Fong, Bernard
Burnand, Jean-Christophe Luthi, L. Duncan Saunders, Cynthia A. Beck,
Thomas E. Feasby, and William A. Ghali. 2005. “Coding
Algorithms for Defining
Comorbidities in ICD-9-CM and
ICD-10 Administrative
Data.” Medical Care 43 (11): 1130. https://doi.org/10.1097/01.mlr.0000182534.19832.83.
Quang, Daniel, Yifei Chen, and Xiaohui Xie. 2015.
“DANN: A Deep Learning Approach for Annotating the
Pathogenicity of Genetic Variants.” Bioinformatics 31
(5): 761–63. https://doi.org/10.1093/bioinformatics/btu703.
Quang, Daniel, and Xiaohui Xie. 2016. “DanQ: A Hybrid
Convolutional and Recurrent Deep Neural Network for Quantifying the
Function of DNA Sequences.” Nucleic Acids
Research 44 (11): e107. https://doi.org/10.1093/nar/gkw226.
Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
Sandhini Agarwal, Girish Sastry, et al. 2021. “Learning
Transferable Visual Models
From Natural Language
Supervision.” In Proceedings of the 38th
International Conference on
Machine Learning, 8748–63.
Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2023.
“Exploring the Limits of Transfer
Learning with a Unified
Text-to-Text Transformer.”
arXiv. https://doi.org/10.48550/arXiv.1910.10683.
Rakowski, Alexander, and Christoph Lippert. 2025.
“[MIFM] Multiple Instance Fine-Mapping:
Predicting Causal Regulatory Variants with a Deep Sequence
Model.” medRxiv. https://doi.org/10.1101/2025.06.13.25329551.
Rao, Roshan, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John
Canny, Pieter Abbeel, and Yun S. Song. 2019. “Evaluating
Protein Transfer Learning with
TAPE.” arXiv. https://doi.org/10.48550/arXiv.1906.08230.
Rao, Roshan, Joshua Meier, Tom Sercu, Sergey Ovchinnikov, and Alexander
Rives. 2020. “Transformer Protein Language Models Are Unsupervised
Structure Learners.” bioRxiv. https://doi.org/10.1101/2020.12.15.422761.
Rao, Suhas S. P., Su-Chen Huang, Brian Glenn St Hilaire, Jesse M.
Engreitz, Elizabeth M. Perez, Kyong-Rim Kieffer-Kwon, Adrian L. Sanborn,
et al. 2017. “Cohesin Loss Eliminates
All Loop Domains.”
Cell 171 (2): 305–320.e24. https://doi.org/10.1016/j.cell.2017.09.026.
Rao, Suhas S. P., Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova,
Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, et al. 2014.
“A 3D Map of the Human
Genome at Kilobase Resolution
Reveals Principles of Chromatin
Looping.” Cell 159 (7): 1665–80. https://doi.org/10.1016/j.cell.2014.11.021.
Regev, Aviv, Sarah A Teichmann, Eric S Lander, Ido Amit, Christophe
Benoist, Ewan Birney, Bernd Bodenmiller, et al. 2017. “The
Human Cell Atlas.” Edited
by Thomas R Gingeras. eLife 6 (December): e27041. https://doi.org/10.7554/eLife.27041.
Rehm, Heidi L., Jonathan S. Berg, Lisa D. Brooks, Carlos D. Bustamante,
James P. Evans, Melissa J. Landrum, David H. Ledbetter, et al. 2015.
“ClinGen — The Clinical
Genome Resource.” New England
Journal of Medicine 372 (23): 2235–42. https://doi.org/10.1056/NEJMsr1406261.
Relling, Mary V., Teri E. Klein, Roseann S. Gammal, Michelle
Whirl-Carrillo, James M. Hoffman, and Kelly E. Caudle. 2019. “The
Clinical Pharmacogenetics
Implementation Consortium: 10
Years Later.” Clinical Pharmacology
& Therapeutics 107 (1): 171–75. https://doi.org/10.1002/cpt.1651.
Rentzsch, Philipp, Max Schubach, Jay Shendure, and Martin Kircher. 2021.
“CADD-Splice—Improving Genome-Wide
Variant Effect Prediction Using Deep Learning-Derived Splice
Scores.” Genome Medicine 13 (1): 31. https://doi.org/10.1186/s13073-021-00835-9.
Rentzsch, Philipp, Daniela Witten, Gregory M Cooper, Jay Shendure, and
Martin Kircher. 2019. “CADD: Predicting the
Deleteriousness of Variants Throughout the Human Genome.”
Nucleic Acids Research 47 (D1): D886–94. https://doi.org/10.1093/nar/gky1016.
Richards, Sue, Nazneen Aziz, Sherri Bale, David Bick, Soma Das, Julie
Gastier-Foster, Wayne W. Grody, et al. 2015. “Standards and
Guidelines for the Interpretation of Sequence Variants: A Joint
Consensus Recommendation of the American
College of Medical Genetics and
Genomics and the Association for
Molecular Pathology.” Genetics in
Medicine 17 (5): 405–24. https://doi.org/10.1038/gim.2015.30.
Richardson, Peter, Ivan Griffin, Catherine Tucker, Dan Smith, Olly
Oechsle, Anne Phelan, Michael Rawling, Edward Savory, and Justin
Stebbing. 2020. “Baricitinib as Potential Treatment for 2019-nCoV Acute Respiratory Disease.” The
Lancet 395 (10223): e30–31. https://doi.org/10.1016/S0140-6736(20)30304-4.
Rieke, Nicola, Jonny Hancox, Wenqi Li, Fausto Milletarì, Holger R. Roth,
Shadi Albarqouni, Spyridon Bakas, et al. 2020. “The Future of
Digital Health with Federated Learning.” Npj Digital
Medicine 3 (1): 119. https://doi.org/10.1038/s41746-020-00323-1.
Risch, Neil, and Kathleen Merikangas. 1996. “The
Future of Genetic Studies of
Complex Human Diseases.”
Science 273 (5281): 1516–17. https://doi.org/10.1126/science.273.5281.1516.
Rives, Alexander, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin,
Jason Liu, Demi Guo, et al. 2021. “[ESM-1b]
Biological Structure and Function Emerge from Scaling
Unsupervised Learning to 250 Million Protein Sequences.”
Proceedings of the National Academy of Sciences of the United States
of America 118 (15): e2016239118. https://doi.org/10.1073/pnas.2016239118.
Robinson, James, Dominic J Barker, Xenia Georgiou, Michael A Cooper,
Paul Flicek, and Steven G E Marsh. 2020.
“IPD-IMGT/HLA
Database.” Nucleic Acids Research 48 (D1):
D948–55. https://doi.org/10.1093/nar/gkz950.
Rogers, Anna, Olga Kovaleva, and Anna Rumshisky. 2021. “A
Primer in BERTology: What
We Know About How
BERT Works.” Transactions of the
Association for Computational Linguistics 8 (January): 842–66. https://doi.org/10.1162/tacl_a_00349.
Rost, Burkhard. 1999. “Twilight Zone of Protein Sequence
Alignments.” Protein Engineering 12 (2): 85–94. https://doi.org/10.1093/protein/12.2.85.
Ruan, Yunfeng, Yen-Feng Lin, Yen-Chen Anne Feng, Chia-Yen Chen, Max Lam,
Zhenglin Guo, Lin He, et al. 2022. “Improving Polygenic Prediction
in Ancestrally Diverse Populations.” Nature Genetics 54
(5): 573–80. https://doi.org/10.1038/s41588-022-01054-7.
Rubin, Alan F., Hannah Gelman, Nathan Lucas, Sandra M. Bajjalieh,
Anthony T. Papenfuss, Terence P. Speed, and Douglas M. Fowler. 2017.
“A Statistical Framework for Analyzing Deep Mutational Scanning
Data.” Genome Biology 18 (1): 150. https://doi.org/10.1186/s13059-017-1272-5.
Rubinacci, Simone, Diogo M. Ribeiro, Robin J. Hofmeister, and Olivier
Delaneau. 2021. “Efficient Phasing and Imputation of Low-Coverage
Sequencing Data Using Large Reference Panels.” Nature
Genetics 53 (1): 120–26. https://doi.org/10.1038/s41588-020-00756-0.
Saadat, Ali, and Jacques Fellay. 2024. “DNA
Language Model and Interpretable
Graph Neural Network
Identify Genes and Pathways
Involved in Rare
Diseases.” In Proceedings of the 1st
Workshop on Language + Molecules
(L+M 2024), 103–15. https://doi.org/10.18653/v1/2024.langmol-1.13.
Sainz, Oscar, Jon Campos, Iker García-Ferrero, Julen Etxaniz, Oier Lopez
de Lacalle, and Eneko Agirre. 2023. “NLP
Evaluation in Trouble: On the
Need to Measure LLM
Data Contamination for Each
Benchmark.” In Findings of the
Association for Computational
Linguistics: EMNLP 2023, edited by Houda
Bouamor, Juan Pino, and Kalika Bali, 10776–87. Singapore: Association
for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.722.
Sakaue, Saori, Saisriram Gurajala, Michelle Curtis, Yang Luo, Wanson
Choi, Kazuyoshi Ishigaki, Joyce B. Kang, et al. 2023. “Tutorial: A
Statistical Genetics Guide to Identifying HLA Alleles
Driving Complex Disease.” Nature Protocols 18 (9):
2625–41. https://doi.org/10.1038/s41596-023-00853-4.
Samek, Wojciech, Gregoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and
Klaus-Robert Müller. 2019. Explainable AI: Interpreting, Explaining
and Visualizing Deep Learning. Vol. 11700. LNAI. Cham: Springer. https://doi.org/10.1007/978-3-030-28954-6.
Sample, Paul J., Ban Wang, David W. Reid, Vlad Presnyak, Iain J.
McFadyen, David R. Morris, and Georg Seelig. 2019. “Human 5′
UTR Design and Variant Effect Prediction from a Massively
Parallel Translation Assay.” Nature Biotechnology 37
(7): 803–9. https://doi.org/10.1038/s41587-019-0164-5.
Sanabria, Melissa, Jonas Hirsch, Pierre M. Joubert, and Anna R. Poetsch.
2024. “[GROVER] DNA Language Model
GROVER Learns Sequence Context in the Human Genome.”
Nature Machine Intelligence 6 (8): 911–23. https://doi.org/10.1038/s42256-024-00872-0.
Sanborn, Adrian L., Suhas S. P. Rao, Su-Chen Huang, Neva C. Durand,
Miriam H. Huntley, Andrew I. Jewett, Ivan D. Bochkov, et al. 2015.
“Chromatin Extrusion Explains Key Features of Loop and Domain
Formation in Wild-Type and Engineered Genomes.” Proceedings
of the National Academy of Sciences 112 (47): E6456–65. https://doi.org/10.1073/pnas.1518552112.
Sanderson, Theo, Maxwell L Bileschi, David Belanger, and Lucy J Colwell.
2023. “ProteInfer, Deep Neural Networks for Protein
Functional Inference.” Edited by Volker Dötsch and Max V Staller.
eLife 12 (February): e80942. https://doi.org/10.7554/eLife.80942.
Sangkuhl, Katrin, Michelle Whirl-Carrillo, Ryan M. Whaley, Mark Woon,
Adam Lavertu, Russ B. Altman, Lester Carter, Anurag Verma, Marylyn D.
Ritchie, and Teri E. Klein. 2019. “Pharmacogenomics
Clinical Annotation Tool
(PharmCAT).” Clinical Pharmacology &
Therapeutics 107 (1): 203–10. https://doi.org/10.1002/cpt.1568.
Sarkisyan, Karen S., Dmitry A. Bolotin, Margarita V. Meer, Dinara R.
Usmanova, Alexander S. Mishin, George V. Sharonov, Dmitry N. Ivankov, et
al. 2016. “Local Fitness Landscape of the Green Fluorescent
Protein.” Nature 533 (7603): 397–401. https://doi.org/10.1038/nature17995.
Schiff, Yair, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and
Volodymyr Kuleshov. 2024. “Caduceus:
Bi-Directional Equivariant
Long-Range DNA
Sequence Modeling.” arXiv. https://doi.org/10.48550/arXiv.2403.03234.
Schmidt, Amand F., Chris Finan, Maria Gordillo-Marañón, Folkert W.
Asselbergs, Daniel F. Freitag, Riyaz S. Patel, Benoît Tyl, et al. 2020.
“Genetic Drug Target Validation Using Mendelian
Randomisation.” Nature Communications 11 (1): 3255. https://doi.org/10.1038/s41467-020-16969-0.
Schubach, Max, Thorben Maass, Lusiné Nazaretyan, Sebastian Röner, and
Martin Kircher. 2024. “CADD V1.7: Using Protein
Language Models, Regulatory CNNs and Other Nucleotide-Level
Scores to Improve Genome-Wide Variant Predictions.” Nucleic
Acids Research 52 (D1): D1143–54. https://doi.org/10.1093/nar/gkad989.
Shafin, Kishwar, Trevor Pesout, Pi-Chuan Chang, Maria Nattestad, Alexey
Kolesnikov, Sidharth Goel, Gunjan Baid, et al. 2021.
“Haplotype-Aware Variant Calling with
PEPPER-Margin-DeepVariant Enables
High Accuracy in Nanopore Long-Reads.” Nature Methods 18
(11): 1322–32. https://doi.org/10.1038/s41592-021-01299-w.
Shalem, Ophir, Neville E. Sanjana, Ella Hartenian, Xi Shi, David A.
Scott, Tarjei S. Mikkelsen, Dirk Heckl, et al. 2014.
“Genome-Scale CRISPR-Cas9
Knockout Screening in Human
Cells.” Science 343 (6166): 84–87. https://doi.org/10.1126/science.1247005.
Sherry, S. T., M.-H. Ward, M. Kholodov, J. Baker, L. Phan, E. M.
Smigielski, and K. Sirotkin. 2001. “dbSNP: The NCBI Database of Genetic
Variation.” Nucleic Acids Research 29 (1): 308–11. https://doi.org/10.1093/nar/29.1.308.
Shevlane, Toby. 2022. “Structured Access: An Emerging Paradigm for
Safe AI Deployment.” arXiv. https://doi.org/10.48550/arXiv.2201.05159.
Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. 2017.
“Learning Important Features
Through Propagating Activation
Differences.” In Proceedings of the 34th
International Conference on
Machine Learning, 3145–53. PMLR.
Shrikumar, Avanti, Katherine Tian, Žiga Avsec, Anna Shcherbina,
Abhimanyu Banerjee, Mahfuza Sharmin, Surag Nair, and Anshul Kundaje.
2018. “Technical Note on Transcription
Factor Motif Discovery from
Importance Scores
(TF-MoDISco) Version 0.5.6.5.” arXiv.
https://doi.org/10.48550/arXiv.1811.00416.
Siepel, Adam, Gill Bejerano, Jakob S. Pedersen, Angie S. Hinrichs,
Minmei Hou, Kate Rosenbloom, Hiram Clawson, et al. 2005.
“[PhastCons] Evolutionarily Conserved
Elements in Vertebrate, Insect, Worm, and Yeast Genomes.”
Genome Research 15 (8): 1034–50. https://doi.org/10.1101/gr.3715005.
Singh, Jaswinder, Jack Hanson, Kuldip Paliwal, and Yaoqi Zhou. 2019.
“RNA Secondary Structure Prediction Using an Ensemble
of Two-Dimensional Deep Neural Networks and Transfer Learning.”
Nature Communications 10 (1): 5407. https://doi.org/10.1038/s41467-019-13395-9.
Sirugo, Giorgio, Scott M. Williams, and Sarah A. Tishkoff. 2019.
“The Missing Diversity in
Human Genetic Studies.”
Cell 177 (1): 26–31. https://doi.org/10.1016/j.cell.2019.02.048.
Smolka, Moritz, Luis F. Paulin, Christopher M. Grochowski, Dominic W.
Horner, Medhat Mahmoud, Sairam Behera, Ester Kalef-Ezra, et al. 2024.
“Detection of Mosaic and Population-Level Structural Variants with
Sniffles2.” Nature Biotechnology 42 (10):
1571–80. https://doi.org/10.1038/s41587-023-02024-y.
Snell, Jake, Kevin Swersky, and Richard Zemel. 2017. “Prototypical
Networks for Few-Shot
Learning.” In Advances in Neural
Information Processing
Systems. Vol. 30. Curran Associates, Inc.
Sohail, Mashaal, María J. Palma-Martínez, Amanda Y. Chong, Consuelo D.
Quinto-Cortés, Carmina Barberena-Jonas, Santiago G. Medina-Muñoz, Aaron
Ragsdale, et al. 2023. “Mexican Biobank Advances
Population and Medical Genomics of Diverse Ancestries.”
Nature 622 (7984): 775–83. https://doi.org/10.1038/s41586-023-06560-0.
Soice, Emily H., Rafael Rocha, Kimberlee Cordova, Michael Specter, and
Kevin M. Esvelt. 2023. “Can Large Language Models Democratize
Access to Dual-Use Biotechnology?” arXiv. https://doi.org/10.48550/arXiv.2306.03809.
Sollis, Elliot, Abayomi Mosaku, Ala Abid, Annalisa Buniello, Maria
Cerezo, Laurent Gil, Tudor Groza, et al. 2023. “The
NHGRI-EBI GWAS
Catalog: Knowledgebase and Deposition Resource.”
Nucleic Acids Research 51 (D1): D977–85. https://doi.org/10.1093/nar/gkac1010.
Somani, Ayush, Alexander Horsch, and Dilip K. Prasad. 2023.
Interpretability in Deep Learning. Cham: Springer. https://doi.org/10.1007/978-3-031-20639-9.
Song, Li, Gali Bai, X. Shirley Liu, Bo Li, and Heng Li. 2022.
“T1K: Efficient and Accurate KIR and
HLA Genotyping with Next-Generation Sequencing
Data.” bioRxiv. https://doi.org/10.1101/2022.10.26.513955.
Spitale, Robert C., Ryan A. Flynn, Qiangfeng Cliff Zhang, Pete Crisalli,
Byron Lee, Jong-Wha Jung, Hannes Y. Kuchelmeister, et al. 2015.
“Structural Imprints in Vivo Decode RNA Regulatory
Mechanisms.” Nature 519 (7544): 486–90. https://doi.org/10.1038/nature14263.
Stebbing, Justin, Venkatesh Krishnan, Stephanie de Bono, Silvia
Ottaviani, Giacomo Casalini, Peter J. Richardson, Vanessa Monteil, et
al. 2020. “Mechanism of Baricitinib Supports Artificial
Intelligence‐predicted Testing in COVID‐19
Patients.” EMBO Molecular Medicine 12 (8):
EMMM202012697. https://doi.org/10.15252/emmm.202012697.
Steinegger, Martin, Milot Mirdita, and Johannes Söding. 2019.
“Protein-Level Assembly Increases Protein Sequence Recovery from
Metagenomic Samples Manyfold.” Nature Methods 16 (7):
603–6. https://doi.org/10.1038/s41592-019-0437-4.
Steinegger, Martin, and Johannes Söding. 2017.
“MMseqs2 Enables Sensitive Protein Sequence Searching
for the Analysis of Massive Data Sets.” Nature
Biotechnology 35 (11): 1026–28. https://doi.org/10.1038/nbt.3988.
Stenson, Peter D., Matthew Mort, Edward V. Ball, Katy Evans, Matthew
Hayden, Sally Heywood, Michelle Hussain, Andrew D. Phillips, and David
N. Cooper. 2017. “The Human Gene
Mutation Database: Towards a Comprehensive
Repository of Inherited Mutation Data for Medical Research, Genetic
Diagnosis and Next-Generation Sequencing Studies.” Human
Genetics 136 (6): 665–77. https://doi.org/10.1007/s00439-017-1779-6.
Steyerberg, Ewout W. 2019. Clinical Prediction Models: A Practical
Approach to Development, Validation, and Updating. 2nd ed. Cham:
Springer. https://doi.org/10.1007/978-3-030-16399-0.
Su, Chang, Zichun Xu, Xinning Shan, Biao Cai, Hongyu Zhao, and Jingfei
Zhang. 2023. “Cell-Type-Specific Co-Expression Inference from
Single Cell RNA-Sequencing Data.” Nature
Communications 14 (1): 4846. https://doi.org/10.1038/s41467-023-40503-7.
Su, Jianlin, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng
Liu. 2024. “RoFormer: Enhanced
Transformer with Rotary Position
Embedding.” Neurocomputing 568 (February):
127063. https://doi.org/10.1016/j.neucom.2023.127063.
Sudlow, Cathie, John Gallacher, Naomi Allen, Valerie Beral, Paul Burton,
John Danesh, Paul Downey, et al. 2015. “UK
Biobank: An Open
Access Resource for Identifying
the Causes of a Wide Range of
Complex Diseases of Middle and
Old Age.” PLOS Medicine 12
(3): e1001779. https://doi.org/10.1371/journal.pmed.1001779.
Sullivan, Patrick F., Jennifer R. S. Meadows, Steven Gazal, BaDoi N.
Phan, Xue Li, Diane P. Genereux, Michael X. Dong, et al. 2023.
“Leveraging Base-Pair Mammalian Constraint to Understand Genetic
Variation and Human Disease.” Science 380 (6643):
eabn2937. https://doi.org/10.1126/science.abn2937.
Sundaram, Laksshman, Hong Gao, Samskruthi Reddy Padigepati, Jeremy F.
McRae, Yanjun Li, Jack A. Kosmicki, Nondas Fritzilas, et al. 2018.
“Predicting the Clinical Impact of Human Mutation with Deep Neural
Networks.” Nature Genetics 50 (8): 1161–70. https://doi.org/10.1038/s41588-018-0167-z.
Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. 2017. “Axiomatic
Attribution for Deep
Networks.” In Proceedings of the 34th
International Conference on
Machine Learning, 3319–28. PMLR.
Supreme Court of the United States. 2013. “Assoc. For
Molecular Pathology v. Myriad
Genetics, Inc., 569
U.S. 576 (2013).”
Suzek, Baris E., Hongzhan Huang, Peter McGarvey, Raja Mazumder, and
Cathy H. Wu. 2007. “UniRef: Comprehensive and
Non-Redundant UniProt Reference Clusters.”
Bioinformatics 23 (10): 1282–88. https://doi.org/10.1093/bioinformatics/btm098.
Svensson, Valentine. 2020. “Droplet scRNA-Seq Is Not Zero-Inflated.” Nature
Biotechnology 38 (2): 147–50. https://doi.org/10.1038/s41587-019-0379-5.
Swanson, Kyle, Howard Chang, and James Zou. 2022. “Predicting
Immune Escape with Pretrained
Protein Language Model
Embeddings.” In Proceedings of the 17th
Machine Learning in Computational
Biology Meeting, 110–30. PMLR.
Swartout, William R., and Johanna D. Moore. 1993. “Explanation in
Second Generation Expert Systems.” In Second Generation
Expert Systems, 543–85. Springer.
Szklarczyk, Damian, Rebecca Kirsch, Mikaela Koutrouli, Katerina Nastou,
Farrokh Mehryary, Radja Hachilif, Annika L Gable, et al. 2023.
“The STRING Database in 2023: Protein–Protein
Association Networks and Functional Enrichment Analyses for Any
Sequenced Genome of Interest.” Nucleic Acids Research 51
(D1): D638–46. https://doi.org/10.1093/nar/gkac1000.
Tabula Sapiens Consortium, The. 2022. “The Tabula
Sapiens: A Multiple-Organ, Single-Cell
Transcriptomic Atlas of Humans.” Science 376 (6594):
eabl4896. https://doi.org/10.1126/science.abl4896.
Taliun, Daniel, Daniel N. Harris, Michael D. Kessler, Jedidiah Carlson,
Zachary A. Szpiech, Raul Torres, Sarah A. Gagliano Taliun, et al. 2021.
“Sequencing of 53,831 Diverse Genomes from the NHLBI
TOPMed Program.” Nature 590
(7845): 290–99. https://doi.org/10.1038/s41586-021-03205-y.
Tan, Jimin, Nina Shenker-Tauris, Javier Rodriguez-Hernaez, Eric Wang,
Theodore Sakellaropoulos, Francesco Boccalatte, Palaniraja Thandapani,
et al. 2023. “Cell-Type-Specific Prediction of 3D
Chromatin Organization Enables High-Throughput in Silico Genetic
Screening.” Nature Biotechnology 41 (8): 1140–50. https://doi.org/10.1038/s41587-022-01612-8.
Tang, Fuchou, Catalin Barbacioru, Yangzhou Wang, Ellen Nordman, Clarence
Lee, Nanlan Xu, Xiaohui Wang, et al. 2009. “mRNA-Seq Whole-Transcriptome Analysis
of a Single Cell.” Nature Methods 6 (5): 377–82. https://doi.org/10.1038/nmeth.1315.
Tanigawa, Yosuke, Junyang Qian, Guhan Venkataraman, Johanne Marie
Justesen, Ruilin Li, Robert Tibshirani, Trevor Hastie, and Manuel A.
Rivas. 2022. “Significant Sparse Polygenic Risk Scores Across 813
Traits in UK Biobank.” PLOS
Genetics 18 (3): e1010105. https://doi.org/10.1371/journal.pgen.1010105.
Tate, John G, Sally Bamford, Harry C Jubb, Zbyslaw Sondka, David M
Beare, Nidhi Bindal, Harry Boutselakis, et al. 2019.
“COSMIC: The Catalogue Of
Somatic Mutations In
Cancer.” Nucleic Acids Research 47 (D1):
D941–47. https://doi.org/10.1093/nar/gky1015.
Tavtigian, Sean V., Marc S. Greenblatt, Steven M. Harrison, Robert L.
Nussbaum, Snehit A. Prabhu, Kenneth M. Boucher, and Leslie G. Biesecker.
2018. “Modeling the ACMG/AMP Variant
Classification Guidelines as a Bayesian Classification
Framework.” Genetics in Medicine 20 (9): 1054–60. https://doi.org/10.1038/gim.2017.210.
The UniProt Consortium. 2023. “UniProt: The
Universal Protein Knowledgebase
in 2023.” Nucleic Acids Research 51 (D1): D523–31. https://doi.org/10.1093/nar/gkac1052.
Theodoris, Christina V., Ling Xiao, Anant Chopra, Mark D. Chaffin, Zeina
R. Al Sayed, Matthew C. Hill, Helene Mantineo, et al. 2023.
“[Geneformer] Transfer Learning Enables
Predictions in Network Biology.” Nature 618 (7965):
616–24. https://doi.org/10.1038/s41586-023-06139-9.
Tipirneni, Sindhu, and Chandan K. Reddy. 2022.
“Self-Supervised Transformer for
Sparse and Irregularly Sampled
Multivariate Clinical
Time-Series.” ACM Trans. Knowl.
Discov. Data 16 (6): 105:1–17. https://doi.org/10.1145/3516367.
Torkamani, Ali, Nathan E. Wineinger, and Eric J. Topol. 2018. “The
Personal and Clinical Utility of Polygenic Risk Scores.”
Nature Reviews Genetics 19 (9): 581–90. https://doi.org/10.1038/s41576-018-0018-x.
Trop, Evan, Yair Schiff, Edgar Mariano Marroquin, Chia Hsiang Kao, Aaron
Gokaslan, McKinley Polen, Mingyi Shao, et al. 2024. “The
Genomics Long-Range
Benchmark: Advancing DNA
Language Models,” October.
U.S. Food and Drug Administration. 2021. “Artificial
Intelligence/Machine Learning
(‘AI/ML’)-Based
Software as a Medical Device
(‘SaMD’) Action
Plan.”
US Congress. 2008. “Genetic Information
Nondiscrimination Act of 2008.”
Van der Auwera, Geraldine A., Mauricio O. Carneiro, Christopher Hartl,
Ryan Poplin, Guillermo del Angel, Ami Levy-Moonshine, Tadeusz Jordan, et
al. 2018. “From FastQ Data to
High-Confidence Variant
Calls: The Genome
Analysis Toolkit Best
Practices Pipeline.” Current
Protocols in Bioinformatics 43 (1): 11.10.1–33. https://doi.org/10.1002/0471250953.bi1110s43.
Vapnik, Vladimir. 1998. Statistical Learning
Theory. Wiley.
Varadi, Mihaly, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy
Natassia, Galabina Yordanova, David Yuan, et al. 2022.
“AlphaFold Protein
Structure Database: Massively Expanding the
Structural Coverage of Protein-Sequence Space with High-Accuracy
Models.” Nucleic Acids Research 50 (D1): D439–44. https://doi.org/10.1093/nar/gkab1061.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion
Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023.
“Attention Is All You
Need.” arXiv. https://doi.org/10.48550/arXiv.1706.03762.
Veličković, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero,
Pietro Liò, and Yoshua Bengio. 2018. “Graph Attention
Networks.” arXiv. https://doi.org/10.48550/arXiv.1710.10903.
Venkatesan, Kavitha, Jean-François Rual, Alexei Vazquez, Ulrich Stelzl,
Irma Lemmens, Tomoko Hirozane-Kishikawa, Tong Hao, et al. 2008.
“An Empirical Framework for Binary Interactome Mapping.”
Nature Methods 6 (1): 83–90. https://doi.org/10.1038/nmeth.1280.
Vickers, Andrew J., and Elena B. Elkin. 2006. “Decision Curve
Analysis: A Novel Method for Evaluating Prediction Models.”
Medical Decision Making 26 (6): 565–74. https://doi.org/10.1177/0272989X06295361.
Vilhjálmsson, Bjarni J., Jian Yang, Hilary K. Finucane, Alexander Gusev,
Sara Lindström, Stephan Ripke, Giulio Genovese, et al. 2015.
“Modeling Linkage Disequilibrium
Increases Accuracy of Polygenic
Risk Scores.” American Journal of
Human Genetics 97 (4): 576–92. https://doi.org/10.1016/j.ajhg.2015.09.001.
Vishniakov, Kirill, Boulbaba Ben Amor, Engin Tekin, Nancy A. ElNaker,
Karthik Viswanathan, Aleksandr Medvedev, Aahan Singh, et al. 2025.
“Gene42: Long-Range Genomic
Foundation Model With
Dense Attention.” arXiv. https://doi.org/10.48550/arXiv.2503.16565.
Visscher, Peter M., William G. Hill, and Naomi R. Wray. 2008.
“Heritability in the Genomics Era — Concepts and
Misconceptions.” Nature Reviews Genetics 9 (4): 255–66.
https://doi.org/10.1038/nrg2322.
Võsa, Urmo, Annique Claringbould, Harm-Jan Westra, Marc Jan Bonder,
Patrick Deelen, Biao Zeng, Holger Kirsten, et al. 2021.
“Large-Scale Cis- and Trans-eQTL
Analyses Identify Thousands of Genetic Loci and Polygenic Scores That
Regulate Blood Gene Expression.” Nature Genetics 53 (9):
1300–1310. https://doi.org/10.1038/s41588-021-00913-z.
Wang, Dequan, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor
Darrell. 2021. “Tent: Fully Test-Time
Adaptation by Entropy
Minimization.” arXiv. https://doi.org/10.48550/arXiv.2006.10726.
Wang, Gao, Abhishek Sarkar, Peter Carbonetto, and Matthew Stephens.
2020. “A Simple New
Approach to Variable Selection in
Regression, with Application to
Genetic Fine Mapping.”
Journal of the Royal Statistical Society Series B: Statistical
Methodology 82 (5): 1273–1300. https://doi.org/10.1111/rssb.12388.
Wang, Sinong, Belinda Z. Li, Madian Khabsa, Han Fang, and Hao Ma. 2020.
“Linformer: Self-Attention with
Linear Complexity.” arXiv. https://doi.org/10.48550/arXiv.2006.04768.
Wang, Yihui, Zhiyuan Cai, Qian Zeng, Yihang Gao, Jiarui Ouyang, Yingxue
Xu, Shu Yang, et al. 2025. “Genomic Touchstone:
Benchmarking Genomic Language
Models in the Context of the
Central Dogma.” bioRxiv. https://doi.org/10.1101/2025.06.25.661622.
Wang, Zirui, Zihang Dai, Barnabas Poczos, and Jaime Carbonell. 2018.
“Characterizing and Avoiding Negative
Transfer.” In, 11293–302.
Watson, Joseph L., David Juergens, Nathaniel R. Bennett, Brian L.
Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, et al. 2023.
“De Novo Design of Protein Structure and Function with
RFdiffusion.” Nature 620 (7976): 1089–1100.
https://doi.org/10.1038/s41586-023-06415-8.
Wei, Jason, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph,
Sebastian Borgeaud, Dani Yogatama, et al. 2022. “Emergent
Abilities of Large Language
Models.” arXiv. https://doi.org/10.48550/arXiv.2206.07682.
Weissbrod, Omer, Farhad Hormozdiari, Christian Benner, Ran Cui, Jacob
Ulirsch, Steven Gazal, Armin P. Schoech, et al. 2020.
“Functionally Informed Fine-Mapping and Polygenic Localization of
Complex Trait Heritability.” Nature Genetics 52 (12):
1355–63. https://doi.org/10.1038/s41588-020-00735-5.
Wenger, Aaron M., Paul Peluso, William J. Rowell, Pi-Chuan Chang,
Richard J. Hall, Gregory T. Concepcion, Jana Ebler, et al. 2019.
“Accurate Circular Consensus Long-Read Sequencing Improves Variant
Detection and Assembly of a Human Genome.” Nature
Biotechnology 37 (10): 1155–62. https://doi.org/10.1038/s41587-019-0217-9.
Whirl-Carrillo, M, E M McDonagh, J M Hebert, L Gong, K Sangkuhl, C F
Thorn, R B Altman, and T E Klein. 2012. “Pharmacogenomics
Knowledge for Personalized
Medicine.” Clinical Pharmacology &
Therapeutics 92 (4): 414–17. https://doi.org/10.1038/clpt.2012.96.
Wu, Yang, Zhili Zheng, Loic Thibaut2, Michael E. Goddard, Naomi R. Wray,
Peter M. Visscher, and Jian Zeng. 2024. “Genome-Wide Fine-Mapping
Improves Identification of Causal Variants.” Research
Square, August, rs.3.rs–4759390. https://doi.org/10.21203/rs.3.rs-4759390/v1.
Xiong, Ruibin, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing,
Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tieyan Liu. 2020. “On
Layer Normalization in the
Transformer Architecture.” In
Proceedings of the 37th International
Conference on Machine
Learning, 10524–33. PMLR.
Xu, Keyulu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019.
“How Powerful Are Graph
Neural Networks?” arXiv. https://doi.org/10.48550/arXiv.1810.00826.
Yan, Lulu, Dongyan Zhang, and Xiaoqiang Sun. 2026. “Decoding Cell
State Transitions Driven by Dynamic Cell–Cell Communication in Spatial
Transcriptomics.” Nature Computational Science, January,
1–15. https://doi.org/10.1038/s43588-025-00934-2.
Yang, Fan, Wenchuan Wang, Fang Wang, Yuan Fang, Duyu Tang, Junzhou
Huang, Hui Lu, and Jianhua Yao. 2022. “scBERT as a Large-Scale Pretrained Deep Language
Model for Cell Type Annotation of Single-Cell RNA-Seq
Data.” Nature Machine Intelligence 4 (10): 852–66. https://doi.org/10.1038/s42256-022-00534-z.
Yang, Jian, Beben Benyamin, Brian P. McEvoy, Scott Gordon, Anjali K.
Henders, Dale R. Nyholt, Pamela A. Madden, et al. 2010. “Common
SNPs Explain a Large Proportion of the Heritability for
Human Height.” Nature Genetics 42 (7): 565–69. https://doi.org/10.1038/ng.608.
Yang, Zhilin, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan
Salakhutdinov, and Quoc V. Le. 2020. “XLNet:
Generalized Autoregressive
Pretraining for Language
Understanding.” arXiv. https://doi.org/10.48550/arXiv.1906.08237.
Yengo, Loïc, Sailaja Vedantam, Eirini Marouli, Julia Sidorenko, Eric
Bartell, Saori Sakaue, Marielisa Graff, et al. 2022. “A Saturated
Map of Common Genetic Variants Associated with Human Height.”
Nature 610 (7933): 704–12. https://doi.org/10.1038/s41586-022-05275-y.
Yeo, Gene, and Christopher B. Burge. 2004. “Maximum
Entropy Modeling of Short
Sequence Motifs with Applications
to RNA Splicing Signals.”
Journal of Computational Biology 11 (2-3): 377–94. https://doi.org/10.1089/1066527041410418.
Ying, Chengxuan, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di
He, Yanming Shen, and Tie-Yan Liu. 2021. “Do
Transformers Really Perform
Bad for Graph
Representation?” arXiv. https://doi.org/10.48550/arXiv.2106.05234.
Yu, Ying, Yuanbang Mai, Yuanting Zheng, and Leming Shi. 2024.
“Assessing and Mitigating Batch Effects in Large-Scale Omics
Studies.” Genome Biology 25 (1): 254. https://doi.org/10.1186/s13059-024-03401-9.
Yun, Taedong, Justin Cosentino, Babak Behsaz, Zachary R. McCaw, Davin
Hill, Robert Luben, Dongbing Lai, et al. 2023.
“[REGLE] Unsupervised Representation
Learning Improves Genomic Discovery and Risk Prediction for Respiratory
and Circulatory Functions and Diseases.” medRxiv. https://doi.org/10.1101/2023.04.28.23289285.
Yun, Taedong, Helen Li, Pi-Chuan Chang, Michael F Lin, Andrew Carroll,
and Cory Y McLean. 2021. “Accurate, Scalable Cohort Variant Calls
Using DeepVariant and GLnexus.”
Bioinformatics 36 (24): 5582–89. https://doi.org/10.1093/bioinformatics/btaa1081.
Zanger, Ulrich M., and Matthias Schwab. 2013. “Cytochrome
P450 Enzymes in Drug Metabolism: Regulation of
Gene Expression, Enzyme Activities, and Impact of Genetic
Variation.” Pharmacology & Therapeutics 138 (1):
103–41. https://doi.org/10.1016/j.pharmthera.2012.12.007.
Zeng, Tony, and Yang I. Li. 2022. “Predicting RNA
Splicing from DNA Sequence Using
Pangolin.” Genome Biology 23 (1): 103. https://doi.org/10.1186/s13059-022-02664-4.
Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol
Vinyals. 2021. “Understanding Deep Learning (Still) Requires
Rethinking Generalization.” Commun. ACM 64 (3): 107–15.
https://doi.org/10.1145/3446776.
Zhang, Qiang, Keyang Ding, Tianwen Lyv, Xinda Wang, Qingyu Yin, Yiwen
Zhang, Jing Yu, et al. 2024. “Scientific Large
Language Models: A
Survey on Biological &
Chemical Domains.” arXiv. https://doi.org/10.48550/arXiv.2401.14656.
Zhang, Yu, Rachel Patton McCord, Yu-Jui Ho, Brian R. Laber, Diana S.
Aber, Jungha Kim, Xiaowen Zhang, and Tom Misteli. 2012. “Spatial
Organization of the Mouse Genome and Its Role in Recurrent Chromosomal
Translocations.” Cell 148 (5): 908–21. https://doi.org/10.1016/j.cell.2012.02.002.
Zhao, Yanlong, Yixiao Chen, Jiawen Du, Jun Wen, Quan Sun, Ren Wang, and
Can Chen. 2025. “Dual-Route Embedding-Aware Graph Neural Networks
for Drug Repositioning.” Briefings in Bioinformatics 26
(5): bbaf555. https://doi.org/10.1093/bib/bbaf555.
Zheng, Rongbin, Changxin Wan, Shenglin Mei, Qian Qin, Qiu Wu, Hanfei
Sun, Chen-Hao Chen, et al. 2019. “Cistrome Data
Browser: Expanded Datasets and New Tools for Gene
Regulatory Analysis.” Nucleic Acids Research 47 (D1):
D729–35. https://doi.org/10.1093/nar/gky1094.
Zheng, Zhenxian, Shumin Li, Junhao Su, Amy Wing-Sze Leung, Tak-Wah Lam,
and Ruibang Luo. 2022. “Symphonizing Pileup and Full-Alignment for
Deep Learning-Based Long-Read Variant Calling.” Nature
Computational Science 2 (12): 797–803. https://doi.org/10.1038/s43588-022-00387-x.
Zhou, Jian. 2022. “Sequence-Based Modeling of Three-Dimensional
Genome Architecture from Kilobase to Chromosome Scale.”
Nature Genetics 54 (5): 725–34. https://doi.org/10.1038/s41588-022-01065-4.
Zhou, Jian, Chandra L. Theesfeld, Kevin Yao, Kathleen M. Chen, Aaron K.
Wong, and Olga G. Troyanskaya. 2018. “[Expecto]
Deep Learning Sequence-Based Ab Initio Prediction of
Variant Effects on Expression and Disease Risk.” Nature
Genetics 50 (8): 1171–79. https://doi.org/10.1038/s41588-018-0160-6.
Zhou, Jian, and Olga G. Troyanskaya. 2015. “[DeepSEA]
Predicting Effects of Noncoding Variants with Deep
Learning–Based Sequence Model.” Nature Methods 12 (10):
931–34. https://doi.org/10.1038/nmeth.3547.
Zhou, Zhihan, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and
Han Liu. 2024. “DNABERT-2: Efficient
Foundation Model and Benchmark
For Multi-Species
Genome.” arXiv. https://doi.org/10.48550/arXiv.2306.15006.
Zhou, Zhihan, Weimin Wu, Harrison Ho, Jiayi Wang, Lizhen Shi, Ramana V
Davuluri, Zhong Wang, and Han Liu. 2025.
“DNABERT-S: Pioneering Species
Differentiation with Species-Aware DNA Embeddings.”
Bioinformatics 41 (Supplement_1): i255–64. https://doi.org/10.1093/bioinformatics/btaf188.
Zhu, Ligeng, Zhijian Liu, and Song Han. 2019. “Deep
Leakage from Gradients.” arXiv. https://doi.org/10.48550/arXiv.1906.08935.
Zhu, Xiao, Chenchen Qin, Fang Wang, Fan Yang, Bing He, Yu Zhao, and
Jianhua Yao. 2024. “CD-GPT:
A Biological Foundation
Model Bridging the Gap Between
Molecular Sequences Through
Central Dogma.” bioRxiv. https://doi.org/10.1101/2024.06.24.600337.
Zitnik, Marinka, Monica Agrawal, and Jure Leskovec. 2018.
“Modeling Polypharmacy Side Effects with Graph Convolutional
Networks.” Bioinformatics 34 (13): i457–66. https://doi.org/10.1093/bioinformatics/bty294.
Zook, Justin M., Jennifer McDaniel, Nathan D. Olson, Justin Wagner,
Hemang Parikh, Haynes Heaton, Sean A. Irvine, et al. 2019. “An
Open Resource for Accurately Benchmarking Small Variant and Reference
Calls.” Nature Biotechnology 37 (5): 561–66. https://doi.org/10.1038/s41587-019-0074-6.
Zvyagin, Maxim, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang,
Cindy Orozco Bohorquez, Austin Clyde, et al. 2022.
“GenSLMs: Genome-Scale Language Models
Reveal SARS-CoV-2 Evolutionary
Dynamics.” bioRxiv. https://doi.org/10.1101/2022.10.10.511571.