References

Abràmoff, Michael D., Philip T. Lavin, Michele Birch, Nilay Shah, and James C. Folk. 2018. “Pivotal Trial of an Autonomous AI-Based Diagnostic System for Detection of Diabetic Retinopathy in Primary Care Offices.” Npj Digital Medicine 1 (1): 39. https://doi.org/10.1038/s41746-018-0040-6.
Abramson, Josh, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, et al. 2024. “[AlphaFold3] Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3.” Nature 630 (8016): 493–500. https://doi.org/10.1038/s41586-024-07487-w.
Adamson, Britt, Thomas M. Norman, Marco Jost, Min Y. Cho, James K. Nuñez, Yuwen Chen, Jacqueline E. Villalta, et al. 2016. “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response.” Cell 167 (7): 1867–1882.e21. https://doi.org/10.1016/j.cell.2016.11.048.
Adzhubei, Ivan A., Steffen Schmidt, Leonid Peshkin, Vasily E. Ramensky, Anna Gerasimova, Peer Bork, Alexey S. Kondrashov, and Shamil R. Sunyaev. 2010. “A Method and Server for Predicting Damaging Missense Mutations.” Nature Methods 7 (4): 248–49. https://doi.org/10.1038/nmeth0410-248.
Agarwal, Vikram, and Jay Shendure. 2020. “Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks.” Cell Reports 31 (7): 107663. https://doi.org/10.1016/j.celrep.2020.107663.
Ahdritz, Gustaf, Nazim Bouatta, Christina Floristean, Sachin Kadyan, Qinghui Xia, William Gerecke, Timothy J. O’Donnell, et al. 2024. OpenFold: Retraining AlphaFold2 Yields New Insights into Its Learning Mechanisms and Capacity for Generalization.” Nature Methods 21 (8): 1514–24. https://doi.org/10.1038/s41592-024-02272-z.
Ahlqvist, Emma, Petter Storm, Annemari Käräjämäki, Mats Martinell, Mozhgan Dorkhan, Annelie Carlsson, Petter Vikman, et al. 2018. “Novel Subgroups of Adult-Onset Diabetes and Their Association with Outcomes: A Data-Driven Cluster Analysis of Six Variables.” The Lancet Diabetes & Endocrinology 6 (5): 361–69. https://doi.org/10.1016/S2213-8587(18)30051-2.
Aibar, Sara, Carmen Bravo González-Blas, Thomas Moerman, Vân Anh Huynh-Thu, Hana Imrichova, Gert Hulselmans, Florian Rambow, et al. 2017. SCENIC: Single-Cell Regulatory Network Inference and Clustering.” Nature Methods 14 (11): 1083–86. https://doi.org/10.1038/nmeth.4463.
All of Us Research Program Investigators, The. 2019. “The All of Us Research Program.” New England Journal of Medicine 381 (7): 668–76. https://doi.org/10.1056/NEJMsr1809937.
Amariuta, Tiffany, Kazuyoshi Ishigaki, Hiroki Sugishita, Tazro Ohta, Masaru Koido, Kushal K. Dey, Koichi Matsuda, et al. 2020. “Improving the Trans-Ancestry Portability of Polygenic Risk Scores by Prioritizing Variants in Predicted Cell-Type-Specific Regulatory Elements.” Nature Genetics 52 (12): 1346–54. https://doi.org/10.1038/s41588-020-00740-8.
Amberger, Joanna S., Carol A. Bocchini, François Schiettecatte, Alan F. Scott, and Ada Hamosh. 2015. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an Online Catalog of Human Genes and Genetic Disorders.” Nucleic Acids Research 43 (D1): D789–98. https://doi.org/10.1093/nar/gku1205.
André, Fabrice, Eva Ciruelos, Gabor Rubovszky, Mario Campone, Sibylle Loibl, Hope S. Rugo, Hiroji Iwata, et al. 2019. “Alpelisib for PIK3CA-Mutated, Hormone ReceptorPositive Advanced Breast Cancer.” New England Journal of Medicine 380 (20): 1929–40. https://doi.org/10.1056/NEJMoa1813904.
Angelopoulos, Anastasios N., and Stephen Bates. 2023. “Conformal Prediction: A Gentle Introduction.” Foundations and Trends® in Machine Learning 16 (4): 494–591. https://doi.org/10.1561/2200000101.
Argelaguet, Ricard, Britta Velten, Damien Arnol, Sascha Dietrich, Thorsten Zenz, John C. Marioni, Florian Buettner, Wolfgang Huber, and Oliver Stegle. 2018. “Multi‐Omics Factor Analysis—a Framework for Unsupervised Integration of Multi‐omics Data Sets.” Molecular Systems Biology 14 (6): MSB178124. https://doi.org/10.15252/msb.20178124.
Arnold, Lord Justice, Lady Justice Laing, and Lord Justice Birss. 2021. “Thaler v Comptroller General of Patents Trade Marks And Designs [2021] EWCA Civ 1374.”
Ashuach, Tal, Mariano I. Gabitto, Michael I. Koodber, Valentine Svensson, Michael I. Jordan, and Nir Yosef. 2023. MultiVI: Deep Generative Model for the Integration of Multimodal Data.” Nature Methods 20 (8): 1232–40. https://doi.org/10.1038/s41592-023-01909-9.
Auton, Adam, Gonçalo R. Abecasis, David M. Altshuler, Richard M. Durbin, Gonçalo R. Abecasis, David R. Bentley, Aravinda Chakravarti, et al. 2015. “A Global Reference for Human Genetic Variation.” Nature 526 (7571): 68–74. https://doi.org/10.1038/nature15393.
Avsec, Žiga, Vikram Agarwal, D. Visentin, J. Ledsam, A. Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, J. Jumper, Pushmeet Kohli, and David R. Kelley. 2021. “[Enformer] Effective Gene Expression Prediction from Sequence by Integrating Long-Range Interactions.” Nature Methods 18 (October): 1196–1203. https://doi.org/10.1038/s41592-021-01252-x.
Avsec, Ziga, Natasha Latysheva, and Jun Cheng. 2025. AlphaGenome: AI for Better Understanding the Genome.”
Bach, Sebastian, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. “On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.” PLoS ONE 10 (7): e0130140. https://doi.org/10.1371/journal.pone.0130140.
Baek, Minkyung, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, et al. 2021. “Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network.” Science 373 (6557): 871–76. https://doi.org/10.1126/science.abj8754.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. “Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off.” Proceedings of the National Academy of Sciences 116 (32): 15849–54. https://doi.org/10.1073/pnas.1903070116.
Ben-David, Shai, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. “A Theory of Learning from Different Domains.” Machine Learning 79 (1): 151–75. https://doi.org/10.1007/s10994-009-5152-4.
Benegas, Gonzalo, Carlos Albors, Alan J. Aw, Chengzhong Ye, and Yun S. Song. 2024. GPN-MSA: An Alignment-Based DNA Language Model for Genome-Wide Variant Effect Prediction.” bioRxiv, April, 2023.10.10.561776. https://doi.org/10.1101/2023.10.10.561776.
Benegas, Gonzalo, Sanjit Singh Batra, and Yun S. Song. 2023. “[GPN] DNA Language Models Are Powerful Predictors of Genome-Wide Variant Effects.” Proceedings of the National Academy of Sciences 120 (44): e2311219120. https://doi.org/10.1073/pnas.2311219120.
Benegas, Gonzalo, Gökcen Eraslan, and Yun S. Song. 2025. “[TraitGym] Benchmarking DNA Sequence Models for Causal Regulatory Variant Prediction in Human Genetics.” bioRxiv. https://doi.org/10.1101/2025.02.11.637758.
Bengs, Viktor, Eyke Hüllermeier, and Willem Waegeman. 2022. “Pitfalls of Epistemic Uncertainty Quantification Through Loss Minimisation.” In Advances in Neural Information Processing Systems, 35:29205–16.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Methodological) 57 (1): 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Benner, Christian, Chris C. A. Spencer, Aki S. Havulinna, Veikko Salomaa, Samuli Ripatti, and Matti Pirinen. 2016. FINEMAP: Efficient Variable Selection Using Summary Data from Genome-Wide Association Studies.” Bioinformatics 32 (10): 1493–1501. https://doi.org/10.1093/bioinformatics/btw018.
Bergquist, Timothy, Sarah L. Stenton, Emily A. W. Nadeau, Alicia B. Byrne, Marc S. Greenblatt, Steven M. Harrison, Sean V. Tavtigian, et al. 2025. “Calibration of Additional Computational Tools Expands ClinGen Recommendation Options for Variant Classification with PP3/BP4 Criteria.” Genetics in Medicine 27 (6): 101402. https://doi.org/10.1016/j.gim.2025.101402.
Berman, Helen M., John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne. 2000. “The Protein Data Bank.” Nucleic Acids Research 28 (1): 235–42. https://doi.org/10.1093/nar/28.1.235.
Birman-Deych, Elena, Amy D. Waterman, Yan Yan, David S. Nilasena, Martha J. Radford, and Brian F. Gage. 2005. “Accuracy of ICD-9-CM Codes for Identifying Cardiovascular and Stroke Risk Factors.” Medical Care 43 (5): 480. https://doi.org/10.1097/01.mlr.0000160417.39497.a9.
Boer, Carl G. de, Eeshit Dhaval Vaishnav, Ronen Sadeh, Esteban Luis Abeyta, Nir Friedman, and Aviv Regev. 2019. “Deciphering Eukaryotic Gene-Regulatory Logic with 100 Million Random Promoters.” Nature Biotechnology 38 (1): 56–65. https://doi.org/10.1038/s41587-019-0315-8.
Bommasani, Rishi, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, et al. 2022. “On the Opportunities and Risks of Foundation Models.” arXiv. https://doi.org/10.48550/arXiv.2108.07258.
Boshar, Sam, Benjamin Evans, Ziqi Tang, Armand Picard, Yanis Adel, Franziska K Lorbeer, Chandana Rajesh, et al. n.d. “A Foundational Model for Joint Sequence-Function Multi-Species Modeling at Scale for Long-Range Genomic Prediction.”
Bowden, Jack, George Davey Smith, and Stephen Burgess. 2015. “Mendelian Randomization with Invalid Instruments: Effect Estimation and Bias Detection Through Egger Regression.” International Journal of Epidemiology 44 (2): 512–25. https://doi.org/10.1093/ije/dyv080.
Brandes, Nadav, Grant Goldman, Charlotte H. Wang, Chun Jimmie Ye, and Vasilis Ntranos. 2023. “Genome-Wide Prediction of Disease Variant Effects with a Deep Protein Language Model.” Nature Genetics 55 (9): 1512–22. https://doi.org/10.1038/s41588-023-01465-0.
Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures.” Statistical Science, August.
Brixi, Garyk, Matthew G. Durrant, Jerome Ku, Michael Poli, Greg Brockman, Daniel Chang, Gabriel A. Gonzalez, et al. 2025. “[Evo 2] Genome Modeling and Design Across All Domains of Life with Evo 2.” bioRxiv. https://doi.org/10.1101/2025.02.18.638918.
Brnich, Sarah E., Ahmad N. Abou Tayoun, Fergus J. Couch, Garry R. Cutting, Marc S. Greenblatt, Christopher D. Heinen, Dona M. Kanavy, et al. 2019. “Recommendations for Application of the Functional Evidence PS3/BS3 Criterion Using the ACMG/AMP Sequence Variant Interpretation Framework.” Genome Medicine 12 (1): 3. https://doi.org/10.1186/s13073-019-0690-2.
Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33 (December): 1877–1901.
Browning, Brian L., Xiaowen Tian, Ying Zhou, and Sharon R. Browning. 2021. “Fast Two-Stage Phasing of Large-Scale Sequence Data.” American Journal of Human Genetics 108 (10): 1880–90. https://doi.org/10.1016/j.ajhg.2021.08.005.
Brunak, Soren, Hannah Carter, and John Moult. 2023. CAGI 6: Critical Assessment of Genome Interpretation, Sixth Edition.” Human Genetics, October.
Buniello, Annalisa, Daniel Suveges, Carlos Cruz-Castillo, Manuel Bernal Llinares, Helena Cornu, Irene Lopez, Kirill Tsukanov, et al. 2025. “Open Targets Platform: Facilitating Therapeutic Hypotheses Building in Drug Discovery.” Nucleic Acids Research 53 (D1): D1467–75. https://doi.org/10.1093/nar/gkae1128.
Bycroft, Clare, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T. Elliott, Kevin Sharp, Allan Motyer, et al. 2018. “The UK Biobank Resource with Deep Phenotyping and Genomic Data.” Nature 562 (7726): 203–9. https://doi.org/10.1038/s41586-018-0579-z.
Camillo, Lucas Paulo de Lima, Raghav Sehgal, Jenel Armstrong, Albert T. Higgins-Chen, Steve Horvath, and Bo Wang. 2024. CpGPT: A Foundation Model for DNA Methylation.” bioRxiv. https://doi.org/10.1101/2024.10.24.619766.
Candès, Emmanuel, Yingying Fan, Lucas Janson, and Jinchi Lv. 2018. “Panning for Gold: Model-X Knockoffs for High Dimensional Controlled Variable Selection.” Journal of the Royal Statistical Society Series B: Statistical Methodology 80 (3): 551–77. https://doi.org/10.1111/rssb.12265.
Cao, Zhi-Jie, and Ge Gao. 2022. “[GLUE] Multi-Omics Single-Cell Data Integration and Regulatory Inference with Graph-Linked Embedding.” Nature Biotechnology 40 (10): 1458–66. https://doi.org/10.1038/s41587-022-01284-4.
Castro-Mondragon, Jaime A., Rafael Riudavets-Puig, Ieva Rauluseviciute, Roza Berhanu Lemma, Laura Turchi, Romain Blanc-Mathieu, Jeremy Lucas, et al. 2022. JASPAR 2022: The 9th Release of the Open-Access Database of Transcription Factor Binding Profiles.” Nucleic Acids Research 50 (D1): D198–207. https://doi.org/10.1093/nar/gkab1113.
Center for Disease Control. 2022. ACCE Model Process for Evaluating Genetic Tests.”
Chandak, Payal, Kexin Huang, and Marinka Zitnik. 2023. “[PrimeKG] Building a Knowledge Graph to Enable Precision Medicine.” Scientific Data 10 (1): 67. https://doi.org/10.1038/s41597-023-01960-3.
Chapman, Paul B., Axel Hauschild, Caroline Robert, John B. Haanen, Paolo Ascierto, James Larkin, Reinhard Dummer, et al. 2011. “Improved Survival with Vemurafenib in Melanoma with BRAF V600E Mutation.” New England Journal of Medicine 364 (26): 2507–16. https://doi.org/10.1056/NEJMoa1103782.
Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority over-Sampling Technique.” J. Artif. Int. Res. 16 (1): 321–57.
Chen, Elaine, Flavia M. Facio, Kerry W. Aradhya, Susan Rojahn, Kathryn E. Hatchell, Sienna Aguilar, Karen Ouyang, et al. 2023. “Rates and Classification of Variants of Uncertain Significance in Hereditary Disease Genetic Testing.” JAMA Network Open 6 (10): e2339571. https://doi.org/10.1001/jamanetworkopen.2023.39571.
Chen, Jiayang, Zhihang Hu, Siqi Sun, Qingxiong Tan, Yixuan Wang, Qinze Yu, Licheng Zong, et al. 2022. “[RNA-FM] Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions.” arXiv. https://doi.org/10.48550/arXiv.2204.00300.
Chen, Kathleen M., Aaron K. Wong, Olga G. Troyanskaya, and Jian Zhou. 2022. “[DeepSEA Sei] A Sequence-Based Global Map of Regulatory Activity for Deciphering Human Genetics.” Nature Genetics 54 (7): 940–49. https://doi.org/10.1038/s41588-022-01102-2.
Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. “A Simple Framework for Contrastive Learning of Visual Representations.” In Proceedings of the 37th International Conference on Machine Learning, 1597–607. PMLR.
Cheng, Jun, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, et al. 2023. “[AlphaMissense] Accurate Proteome-Wide Missense Variant Effect Prediction with AlphaMissense.” Science 381 (6664): eadg7492. https://doi.org/10.1126/science.adg7492.
Cheng, Wenduo, Zhenqiao Song, Yang Zhang, Shike Wang, Danqing Wang, Muyu Yang, Lei Li, and Jian Ma. 2024. DNALONGBENCH: A Benchmark Suite For Long-Range DNA Prediction Tasks,” October.
Cho, Kyunghyun, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” arXiv. https://doi.org/10.48550/arXiv.1409.1259.
Choi, Shing Wan, Timothy Shin-Heng Mak, and Paul F. O’Reilly. 2020. “[PRS] Tutorial: A Guide to Performing Polygenic Risk Score Analyses.” Nature Protocols 15 (9): 2759–72. https://doi.org/10.1038/s41596-020-0353-1.
Choromanski, Krzysztof, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, et al. 2022. “Rethinking Attention with Performers.” arXiv. https://doi.org/10.48550/arXiv.2009.14794.
Chowdhery, Aakanksha, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, et al. 2022. PaLM: Scaling Language Modeling with Pathways.” arXiv. https://doi.org/10.48550/arXiv.2204.02311.
Chung, Wen-Hung, Shuen-Iu Hung, Hong-Shang Hong, Mo-Song Hsih, Li-Cheng Yang, Hsin-Chun Ho, Jer-Yuarn Wu, and Yuan-Tsong Chen. 2004. “A Marker for StevensJohnson Syndrome.” Nature 428 (6982): 486–86. https://doi.org/10.1038/428486a.
Cirulli, Elizabeth T., Simon White, Robert W. Read, Gai Elhanan, William J. Metcalf, Francisco Tanudjaja, Donna M. Fath, et al. 2020. “Genome-Wide Rare Variant Analysis for Thousands of Phenotypes in over 70,000 Exomes from Two Cohorts.” Nature Communications 11 (1): 542. https://doi.org/10.1038/s41467-020-14288-y.
Clarke, Brian, Eva Holtkamp, Hakime Öztürk, Marcel Mück, Magnus Wahlberg, Kayla Meyer, Felix Munzlinger, et al. 2024. “[DeepRVAT] Integration of Variant Annotations Using Deep Set Networks Boosts Rare Variant Association Testing.” Nature Genetics 56 (10): 2271–80. https://doi.org/10.1038/s41588-024-01919-z.
Collins, Gary S., Johannes B. Reitsma, Douglas G. Altman, and Karel G. M. Moons. 2015. “Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement.” BMJ 350: g7594. https://doi.org/10.1136/bmj.g7594.
Consens, Micaela E., Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, and Bo Wang. 2025. “Transformers and Genome Language Models.” Nature Machine Intelligence 7 (3): 346–62. https://doi.org/10.1038/s42256-025-01007-9.
Cornman, Andre, Jacob West-Roberts, Antonio Pedro Camargo, Simon Roux, Martin Beracochea, Milot Mirdita, Sergey Ovchinnikov, and Yunha Hwang. 2024. “The OMG Dataset: An Open MetaGenomic Corpus for Mixed-Modality Genomic Language Modeling.” bioRxiv. https://doi.org/10.1101/2024.08.14.607850.
Corso, Gabriele, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi Jaakkola. 2022. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking.” arXiv.org.
Cui, Haotian, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang. 2024. scGPT: Toward Building a Foundation Model for Single-Cell Multi-Omics Using Generative AI.” Nature Methods 21 (8): 1470–80. https://doi.org/10.1038/s41592-024-02201-0.
Cui, Yin, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. “Class-Balanced Loss Based on Effective Number of Samples.” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9260–69. https://doi.org/10.1109/CVPR.2019.00949.
Dabernig-Heinz, Johanna, Mara Lohde, Martin Hölzer, Adriana Cabal, Rick Conzemius, Christian Brandt, Matthias Kohl, et al. 2024. “A Multicenter Study on Accuracy and Reproducibility of Nanopore Sequencing-Based Genotyping of Bacterial Pathogens.” Journal of Clinical Microbiology 62 (9): e00628–24. https://doi.org/10.1128/jcm.00628-24.
Dallago, Christian, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, and Kevin K. Yang. 2022. FLIP: Benchmark Tasks in Fitness Landscape Inference for Proteins.” bioRxiv. https://doi.org/10.1101/2021.11.09.467890.
Dalla-Torre, Hugo, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, et al. 2023. “Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics.” Nature Methods 22 (2): 287–97. https://doi.org/10.1038/s41592-024-02523-z.
Dang, Tien, Viet Thanh Duy Nguyen, Minh Tuan Le, and Truong-Son Hy. 2025. BioMedKG: Multimodal Contrastive Representation Learning in Augmented BioMedical Knowledge Graphs.” Frontiers in Systems Biology 5 (December). https://doi.org/10.3389/fsysb.2025.1651930.
Dao, Tri, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. 2022. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.” Advances in Neural Information Processing Systems 35 (December): 16344–59.
Dauparas, J., I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, et al. 2022. “Robust Deep Learning–Based Protein Sequence Design Using ProteinMPNN.” Science 378 (6615): 49–56. https://doi.org/10.1126/science.add2187.
Davey Smith, George, and Shah Ebrahim. 2003. Mendelian Randomization’: Can Genetic Epidemiology Contribute to Understanding Environmental Determinants of Disease?*.” International Journal of Epidemiology 32 (1): 1–22. https://doi.org/10.1093/ije/dyg070.
Davydov, Eugene V., David L. Goode, Marina Sirota, Gregory M. Cooper, Arend Sidow, and Serafim Batzoglou. 2010. “Identifying a High Fraction of the Human Genome to Be Under Selective Constraint Using GERP++.” PLOS Computational Biology 6 (12): e1001025. https://doi.org/10.1371/journal.pcbi.1001025.
DeLong, Elizabeth R., David M. DeLong, and Daniel L. Clarke-Pearson. 1988. “Comparing the Areas Under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics 44 (3): 837–45. https://doi.org/10.2307/2531595.
Denny, Joshua C., Marylyn D. Ritchie, Melissa A. Basford, Jill M. Pulley, Lisa Bastarache, Kristin Brown-Gentry, Deede Wang, Dan R. Masys, Dan M. Roden, and Dana C. Crawford. 2010. PheWAS: Demonstrating the Feasibility of a Phenome-Wide Scan to Discover Gene–Disease Associations.” Bioinformatics 26 (9): 1205–10. https://doi.org/10.1093/bioinformatics/btq126.
DePristo, Mark A., Eric Banks, Ryan Poplin, Kiran V. Garimella, Jared R. Maguire, Christopher Hartl, Anthony A. Philippakis, et al. 2011. “A Framework for Variation Discovery and Genotyping Using Next-Generation DNA Sequencing Data.” Nature Genetics 43 (5): 491–98. https://doi.org/10.1038/ng.806.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv. https://doi.org/10.48550/arXiv.1810.04805.
Dey, Kushal K., Bryce van de Geijn, Samuel Sungil Kim, Farhad Hormozdiari, David R. Kelley, and Alkes L. Price. 2020. “Evaluating the Informativeness of Deep Learning Annotations for Human Complex Diseases.” Nature Communications 11 (1): 4703. https://doi.org/10.1038/s41467-020-18515-4.
Dibaeinia, Payam, Chris German, Suyash Shringarpure, Adam Auton, and Aly A. Khan. 2025. PRSformer: Disease Prediction from Million-Scale Individual Genotypes.” bioRxiv. https://doi.org/10.1101/2025.10.26.684578.
Dixit, Atray, Oren Parnas, Biyu Li, Jenny Chen, Charles P. Fulco, Livnat Jerby-Arnon, Nemanja D. Marjanovic, et al. 2016. “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens.” Cell 167 (7): 1853–1866.e17. https://doi.org/10.1016/j.cell.2016.11.038.
Dixon, Jesse R., Siddarth Selvaraj, Feng Yue, Audrey Kim, Yan Li, Yin Shen, Ming Hu, Jun S. Liu, and Bing Ren. 2012. “Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions.” Nature 485 (7398): 376–80. https://doi.org/10.1038/nature11082.
Dockès, Jérôme, Gaël Varoquaux, and Jean-Baptiste Poline. 2021. “Preventing Dataset Shift from Breaking Machine-Learning Biomarkers.” GigaScience 10 (9): giab055. https://doi.org/10.1093/gigascience/giab055.
Duncan, L., H. Shen, B. Gelaye, J. Meijsen, K. Ressler, M. Feldman, R. Peterson, and B. Domingue. 2019. “Analysis of Polygenic Risk Score Usage and Performance in Diverse Human Populations.” Nature Communications 10 (1): 3328. https://doi.org/10.1038/s41467-019-11112-0.
Dwivedi, Vijay Prakash, and Xavier Bresson. 2021. “A Generalization of Transformer Networks to Graphs.” arXiv. https://doi.org/10.48550/arXiv.2012.09699.
Edgar, Ron, Michael Domrachev, and Alex E. Lash. 2002. “Gene Expression Omnibus: NCBI Gene Expression and Hybridization Array Data Repository.” Nucleic Acids Research 30 (1): 207–10. https://doi.org/10.1093/nar/30.1.207.
Elgart, Michael, Genevieve Lyons, Santiago Romero-Brufau, Nuzulul Kurniansyah, Jennifer A. Brody, Xiuqing Guo, Henry J. Lin, et al. 2022. “Non-Linear Machine Learning Models Incorporating SNPs and PRS Improve Polygenic Prediction in Diverse Human Populations.” Communications Biology 5 (1): 856. https://doi.org/10.1038/s42003-022-03812-z.
Elks, Cathy E., Marcel Den Hoed, Jing Hua Zhao, Stephen J. Sharp, Nicholas J. Wareham, Ruth J. F. Loos, and Ken K. Ong. 2012. “Variability in the Heritability of Body Mass Index: A Systematic Review and Meta-Regression.” Frontiers in Endocrinology 3 (February). https://doi.org/10.3389/fendo.2012.00029.
Elnaggar, Ahmed, Michael Heinzinger, Christian Dallago, Ghalia Rihawi, Yu Wang, Llion Jones, Tom Gibbs, et al. 2021. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing.” arXiv. https://doi.org/10.48550/arXiv.2007.06225.
Erlich, Yaniv, and Arvind Narayanan. 2014. “Routes for Breaching and Protecting Genetic Privacy.” Nature Reviews Genetics 15 (6): 409–21. https://doi.org/10.1038/nrg3723.
Esposito, Daniel, Jochen Weile, Jay Shendure, Lea M. Starita, Anthony T. Papenfuss, Frederick P. Roth, Douglas M. Fowler, and Alan F. Rubin. 2019. MaveDB: An Open-Source Platform to Distribute and Interpret Data from Multiplexed Assays of Variant Effect.” Genome Biology 20 (1): 223. https://doi.org/10.1186/s13059-019-1845-6.
European Parliament. 2016. “Regulation on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data.”
———. 2017. “Regulation on Medical Devices.”
———. 2024. “Regulation Laying down Harmonised Rules on Artificial Intelligence.”
Fang, Yitian, Yi Jiang, Leyi Wei, Qin Ma, Zhixiang Ren, Qianmu Yuan, and Dong-Qing Wei. 2023. DeepProSite: Structure-Aware Protein Binding Site Prediction Using ESMFold and Pretrained Language Model.” Bioinformatics 39 (12): btad718. https://doi.org/10.1093/bioinformatics/btad718.
Farh, Kyle Kai-How, Alexander Marson, Jiang Zhu, Markus Kleinewietfeld, William J. Housley, Samantha Beik, Noam Shoresh, et al. 2015. “Genetic and Epigenetic Fine Mapping of Causal Autoimmune Disease Variants.” Nature 518 (7539): 337–43. https://doi.org/10.1038/nature13835.
Fedus, William, Barret Zoph, and Noam Shazeer. 2022. “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.” Journal of Machine Learning Research 23 (120): 1–39.
Ferruz, Noelia, Steffen Schmidt, and Birte Höcker. 2022. ProtGPT2 Is a Deep Unsupervised Language Model for Protein Design.” Nature Communications 13 (1): 4348. https://doi.org/10.1038/s41467-022-32007-7.
Findlay, Gregory M., Riza M. Daza, Beth Martin, Melissa D. Zhang, Anh P. Leith, Molly Gasperini, Joseph D. Janizek, Xingfan Huang, Lea M. Starita, and Jay Shendure. 2018. “Accurate Classification of BRCA1 Variants with Saturation Genome Editing.” Nature 562 (7726): 217–22. https://doi.org/10.1038/s41586-018-0461-z.
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. 2017. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.” In Proceedings of the 34th International Conference on Machine Learning, 1126–35. PMLR.
Fokkema, Ivo F. A. C., Peter E. M. Taschner, Gerard C. P. Schaafsma, J. Celli, Jeroen F. J. Laros, and Johan T. den Dunnen. 2011. LOVD v.2.0: The Next Generation in Gene Variant Databases.” Human Mutation 32 (5): 557–63. https://doi.org/10.1002/humu.21438.
Food and Drug Administration. 2023. “Using Artificial Intelligence and Machine Learning in the Development of Drug and Biological Products; Availability.”
———. 2024. “Laboratory Developed Tests: Small Entity Compliance Guide; Guidance for Laboratory Manufacturers and Food and Drug Administration Staff; Availability.”
———. 2025. “Artificial Intelligence-Enabled Medical Devices.”
Fowler, Douglas M., and Stanley Fields. 2014. “Deep Mutational Scanning: A New Style of Protein Science.” Nature Methods 11 (8): 801–7. https://doi.org/10.1038/nmeth.3027.
Frankish, Adam, Mark Diekhans, Anne-Maud Ferreira, Rory Johnson, Irwin Jungreis, Jane Loveland, Jonathan M Mudge, et al. 2019. GENCODE Reference Annotation for the Human and Mouse Genomes.” Nucleic Acids Research 47 (D1): D766–73. https://doi.org/10.1093/nar/gky955.
Frazer, Jonathan, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K. Min, Kelly Brock, Yarin Gal, and Debora S. Marks. 2021. “[EVE] Disease Variant Prediction with Deep Generative Models of Evolutionary Data.” Nature 599 (7883): 91–95. https://doi.org/10.1038/s41586-021-04043-8.
Friedman, Dan, and Adji Bousso Dieng. 2023. “The Vendi Score: A Diversity Evaluation Metric for Machine Learning.” Transactions on Machine Learning Research. https://openreview.net/forum?id=S7hJSmMM5l.
Fudenberg, Geoff, David R. Kelley, and Katherine S. Pollard. 2020. “[Akita] Predicting 3D Genome Folding from DNA Sequence with Akita.” Nature Methods 17 (11): 1111–17. https://doi.org/10.1038/s41592-020-0958-x.
Gaedigk, Andrea, Magnus Ingelman-Sundberg, Neil A. Miller, J. Steven Leeder, Michelle Whirl-Carrillo, Teri E. Klein, and the PharmVar Steering Committee. 2017. “The Pharmacogene Variation (PharmVar) Consortium: Incorporation of the Human Cytochrome P450 (CYP) Allele Nomenclature Database.” Clinical Pharmacology & Therapeutics 103 (3): 399–401. https://doi.org/10.1002/cpt.910.
Gal, Yarin, and Zoubin Ghahramani. 2016. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of The 33rd International Conference on Machine Learning, 1050–59. PMLR.
Gamazon, Eric R., Heather E. Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, et al. 2015. “A Gene-Based Association Method for Mapping Traits Using Reference Transcriptome Data.” Nature Genetics 47 (9): 1091–98. https://doi.org/10.1038/ng.3367.
Ganin, Yaroslav, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, and Victor Lempitsky. 2016. “Domain-Adversarial Training of Neural Networks.” Journal of Machine Learning Research 17 (59): 1–35.
Gao, Hong, Tobias Hamp, Jeffrey Ede, Joshua G. Schraiber, Jeremy McRae, Moriel Singer-Berk, Yanshen Yang, et al. 2023. “The Landscape of Tolerated Genetic Variation in Humans and Primates.” Science (New York, N.Y.) 380 (6648): eabn8153. https://doi.org/10.1126/science.abn8197.
Garrison, Erik, Jouni Sirén, Adam M. Novak, Glenn Hickey, Jordan M. Eizenga, Eric T. Dawson, William Jones, et al. 2018. “Variation Graph Toolkit Improves Read Mapping by Representing Genetic Variation in the Reference.” Nature Biotechnology 36 (9): 875–79. https://doi.org/10.1038/nbt.4227.
Gasperini, Molly, Andrew J. Hill, José L. McFaline-Figueroa, Beth Martin, Seungsoo Kim, Melissa D. Zhang, Dana Jackson, et al. 2019. “A Genome-Wide Framework for Mapping Gene Regulation via Cellular Genetic Screens.” Cell 176 (1): 377–390.e19. https://doi.org/10.1016/j.cell.2018.11.029.
Gayoso, Adam, Zoë Steier, Romain Lopez, Jeffrey Regier, Kristopher L. Nazor, Aaron Streets, and Nir Yosef. 2021. “Joint Probabilistic Modeling of Single-Cell Multi-Omic Data with totalVI.” Nature Methods 18 (3): 272–82. https://doi.org/10.1038/s41592-020-01050-x.
Ge, Tian, Chia-Yen Chen, Yang Ni, Yen-Chen Anne Feng, and Jordan W. Smoller. 2019. “Polygenic Prediction via Bayesian Regression and Continuous Shrinkage Priors.” Nature Communications 10 (1): 1776. https://doi.org/10.1038/s41467-019-09718-5.
Gebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. “Datasheets for Datasets.” Commun. ACM 64 (12): 86–92. https://doi.org/10.1145/3458723.
Georgantas, Costa, Zoltán Kutalik, and Jonas Richiardi. 2024. “Delphi: A Deep-Learning Method for Polygenic Risk Prediction.” medRxiv. https://doi.org/10.1101/2024.04.19.24306079.
Giambartolomei, Claudia, Damjan Vukcevic, Eric E. Schadt, Lude Franke, Aroon D. Hingorani, Chris Wallace, and Vincent Plagnol. 2014. “Bayesian Test for Colocalisation Between Pairs of Genetic Association Studies Using Summary Statistics.” PLOS Genetics 10 (5): e1004383. https://doi.org/10.1371/journal.pgen.1004383.
Gong, Li, Clarissa J Klein, Kelly E Caudle, Ann M Moyer, Stuart A Scott, Michelle Whirl-Carrillo, Teri E Klein, ClinGen Pharmacogenomics Working Group (PGxWG), and on behalf of the. 2025. “Integrating Pharmacogenomics into the Broader Construct of Genomic Medicine: Efforts by the ClinGen Pharmacogenomics Working Group (PGxWG).” Clinical Chemistry 71 (1): 36–44. https://doi.org/10.1093/clinchem/hvae181.
Goodwin, Sara, John D. McPherson, and W. Richard McCombie. 2016. “Coming of Age: Ten Years of Next-Generation Sequencing Technologies.” Nature Reviews Genetics 17 (6): 333–51. https://doi.org/10.1038/nrg.2016.49.
Gordon, Derek, Stephen J. Finch, and Wonkuk Kim. 2020. Heterogeneity in Statistical Genetics: How to Assess, Address, and Account for Mixtures in Association Studies. Cham: Springer. https://doi.org/10.1007/978-3-030-61121-7.
Granger, C. W. J. 1969. “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods.” Econometrica 37 (3).
Grantham, R. 1974. “Amino Acid Difference Formula to Help Explain Protein Evolution.” Science 185 (4154): 862–64. https://doi.org/10.1126/science.185.4154.862.
Grešová, Katarína, Vlastimil Martinek, David Čechák, Petr Šimeček, and Panagiotis Alexiou. 2023. “Genomic Benchmarks: A Collection of Datasets for Genomic Sequence Classification.” BMC Genomic Data 24 (1): 25. https://doi.org/10.1186/s12863-023-01123-8.
Grimm, Dominik G., Chloé-Agathe Azencott, Fabian Aicheler, Udo Gieraths, Daniel G. MacArthur, Kaitlin E. Samocha, David N. Cooper, et al. 2015. “The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity.” Human Mutation 36 (5): 513–23. https://doi.org/10.1002/humu.22768.
GTEx Consortium, The. 2020. “The GTEx Consortium Atlas of Genetic Regulatory Effects Across Human Tissues.” Science 369 (6509): 1318–30. https://doi.org/10.1126/science.aaz1776.
Gu, Albert, and Tri Dao. 2024. “Mamba: Linear-Time Sequence Modeling with Selective State Spaces.” In.
Gu, Albert, Karan Goel, and Christopher Ré. 2022. “Efficiently Modeling Long Sequences with Structured State Spaces.” arXiv. https://doi.org/10.48550/arXiv.2111.00396.
Gu, Yu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. “Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing.” ACM Trans. Comput. Healthcare 3 (1): 2:1–23. https://doi.org/10.1145/3458754.
Gudbjartsson, Daniel F., Patrick Sulem, Hannes Helgason, Arnaldur Gylfason, Sigurjon A. Gudjonsson, Florian Zink, Asmundur Oddson, et al. 2015. “Sequence Variants from Whole Genome Sequencing a Large Group of Icelanders.” Scientific Data 2 (1): 150011. https://doi.org/10.1038/sdata.2015.11.
Guo, Chuan, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. “On Calibration of Modern Neural Networks.” In Proceedings of the 34th International Conference on Machine Learning, 1321–30. PMLR.
Gusev, Alexander, Arthur Ko, Huwenbo Shi, Gaurav Bhatia, Wonil Chung, Brenda W. J. H. Penninx, Rick Jansen, et al. 2016. “Integrative Approaches for Large-Scale Transcriptome-Wide Association Studies.” Nature Genetics 48 (3): 245–52. https://doi.org/10.1038/ng.3506.
Gymrek, Melissa, Amy L. McGuire, David Golan, Eran Halperin, and Yaniv Erlich. 2013. “Identifying Personal Genomes by Surname Inference.” Science 339 (6117): 321–24. https://doi.org/10.1126/science.1229566.
Hamilton, William L., Rex Ying, and Jure Leskovec. 2017. “[GraphSAGE] Inductive Representation Learning on Large Graphs.” arXiv.org.
Hansen, Ben B. 2008. “The Prognostic Analogue of the Propensity Score.” Biometrika 95 (2): 481–88. https://doi.org/10.1093/biomet/asn004.
Hao, Minsheng, Jing Gong, Xin Zeng, Chiming Liu, Yucheng Guo, Xingyi Cheng, Taifeng Wang, Jianzhu Ma, Xuegong Zhang, and Le Song. 2024. “Large-Scale Foundation Model on Single-Cell Transcriptomics.” Nature Methods 21 (8): 1481–91. https://doi.org/10.1038/s41592-024-02305-7.
Hart, G. Traver, Arun K. Ramani, and Edward M. Marcotte. 2006. “How Complete Are Current Yeast and Human Protein-Interaction Networks?” Genome Biology 7 (11): 120. https://doi.org/10.1186/gb-2006-7-11-120.
Hartwig, Fernando Pires, George Davey Smith, and Jack Bowden. 2017. “Robust Inference in Summary Data Mendelian Randomization via the Zero Modal Pleiotropy Assumption.” International Journal of Epidemiology 46 (6): 1985–98. https://doi.org/10.1093/ije/dyx102.
Hayes, Thomas, Roshan Rao, Halil Akin, Nicholas J. Sofroniew, Deniz Oktay, Zeming Lin, Robert Verkuil, et al. 2025. “[ESM-3] Simulating 500 Million Years of Evolution with a Language Model.” Science 387 (6736): 850–58. https://doi.org/10.1126/science.ads0018.
Health and Human Services. 2018. “Federal Policy for the Protection of Human Subjects.”
Henikoff, S, and J G Henikoff. 1992. “Amino Acid Substitution Matrices from Protein Blocks.” Proceedings of the National Academy of Sciences 89 (22): 10915–19. https://doi.org/10.1073/pnas.89.22.10915.
Hie, Brian L., Varun R. Shanker, Duo Xu, Theodora U. J. Bruun, Payton A. Weidenbacher, Shaogeng Tang, Wesley Wu, John E. Pak, and Peter S. Kim. 2023. “Efficient Evolution of Human Antibodies from General Protein Language Models.” Nature Biotechnology 42 (2): 275–83. https://doi.org/10.1038/s41587-023-01763-2.
Hilker, Rikke, Dorte Helenius, Birgitte Fagerlund, Axel Skytthe, Kaare Christensen, Thomas M. Werge, Merete Nordentoft, and Birte Glenthøj. 2018. “Heritability of Schizophrenia and Schizophrenia Spectrum Based on the Nationwide Danish Twin Register.” Biological Psychiatry, Novel Mechanisms in Schizophrenia Pathophysiology, 83 (6): 492–98. https://doi.org/10.1016/j.biopsych.2017.08.017.
Himmelstein, Daniel Scott, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, and Sergio E Baranzini. 2017. “Systematic Integration of Biomedical Knowledge Prioritizes Drugs for Repurposing.” Edited by Alfonso Valencia. eLife 6 (September): e26726. https://doi.org/10.7554/eLife.26726.
Hoang, Minh, and Mona Singh. 2025. “Locality-Aware Pooling Enhances Protein Language Model Performance Across Varied Applications.” Bioinformatics 41 (Supplement_1): i217–26. https://doi.org/10.1093/bioinformatics/btaf178.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.
Hoffmann, Jordan, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, et al. 2022. “Training Compute-Optimal Large Language Models.” arXiv. https://doi.org/10.48550/arXiv.2203.15556.
Homer, Nils, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V. Pearson, Dietrich A. Stephan, Stanley F. Nelson, and David W. Craig. 2008. “Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays.” PLOS Genetics 4 (8): e1000167. https://doi.org/10.1371/journal.pgen.1000167.
Hormozdiari, Farhad, Emrah Kostem, Eun Yong kang, Bogdan Pasaniuc, and Eleazar Eskin. 2014. “Identifying Causal Variants at Loci with Multiple Signals of Association.” In Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 610–11. BCB ’14. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2649387.2660800.
Horvath, Steve. 2013. DNA Methylation Age of Human Tissues and Cell Types.” Genome Biology 14 (10): 3156. https://doi.org/10.1186/gb-2013-14-10-r115.
Howard, Jeremy, and Sebastian Ruder. 2018. “Universal Language Model Fine-Tuning for Text Classification.” arXiv. https://doi.org/10.48550/arXiv.1801.06146.
Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S. Hansen, Oliver J. Rando, Robert Tjian, and Xavier Darzacq. 2020. “Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding.” Molecular Cell 78 (3): 539–553.e8. https://doi.org/10.1016/j.molcel.2020.03.002.
Hsu, Chloe, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, and Alexander Rives. 2022. “Learning Inverse Folding from Millions of Predicted Structures.” In Proceedings of the 39th International Conference on Machine Learning, 8946–70. PMLR.
Hu, Edward J., Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models.” arXiv. https://doi.org/10.48550/arXiv.2106.09685.
Huang, Po-Ssu, Scott E. Boyken, and David Baker. 2016. “The Coming of Age of de Novo Protein Design.” Nature 537 (7620): 320–27. https://doi.org/10.1038/nature19946.
Hubisz, Melissa J, and Katherine S Pollard. 2014. “Exploring the Genesis and Functions of Human Accelerated Regions Sheds Light on Their Role in Human Evolution.” Current Opinion in Genetics & Development, Genetics of human evolution, 29 (December): 15–21. https://doi.org/10.1016/j.gde.2014.07.005.
Huynh-Thu, Vân Anh, Alexandre Irrthum, Louis Wehenkel, and Pierre Geurts. 2010. “Inferring Regulatory Networks from Expression Data Using Tree-Based Methods.” PLOS ONE 5 (9): e12776. https://doi.org/10.1371/journal.pone.0012776.
Hwang, Yunha, Andre L. Cornman, Elizabeth H. Kellogg, Sergey Ovchinnikov, and Peter R. Girguis. 2024. “Genomic Language Model Predicts Protein Co-Regulation and Function.” Nature Communications 15 (1): 2880. https://doi.org/10.1038/s41467-024-46947-9.
Ingelman-Sundberg, M. 2004. “Genetic Polymorphisms of Cytochrome P450 2D6 (CYP2D6): Clinical Consequences, Evolutionary Aspects and Functional Diversity.” The Pharmacogenomics Journal 5 (1): 6–13. https://doi.org/10.1038/sj.tpj.6500285.
Ingraham, John B., Max Baranov, Zak Costello, Karl W. Barber, Wujie Wang, Ahmed Ismail, Vincent Frappier, et al. 2023. “Illuminating Protein Space with a Programmable Generative Model.” Nature 623 (7989): 1070–78. https://doi.org/10.1038/s41586-023-06728-8.
International Medical Device Regulators Forum. 2014. “Software as a Medical Device: Possible Framework for Risk Categorization and Corresponding Considerations.”
———. 2017. “Software as a Medical Device (SaMD): Clinical Evaluation.”
Ioannidis, Nilah M., Joseph H. Rothstein, Vikas Pejaver, Sumit Middha, Shannon K. McDonnell, Saurabh Baheti, Anthony Musolf, et al. 2016. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.” The American Journal of Human Genetics 99 (4): 877–85. https://doi.org/10.1016/j.ajhg.2016.08.016.
Ionita-Laza, Iuliana, Kenneth McCallum, Bin Xu, and Joseph D. Buxbaum. 2016. “A Spectral Approach Integrating Functional Genomic Annotations for Coding and Noncoding Variants.” Nature Genetics 48 (2): 214–20. https://doi.org/10.1038/ng.3477.
Jagadeesh, Karthik A., Aaron M. Wenger, Mark J. Berger, Harendra Guturu, Peter D. Stenson, David N. Cooper, Jonathan A. Bernstein, and Gill Bejerano. 2016. “M-CAP Eliminates a Majority of Variants of Uncertain Significance in Clinical Exomes at High Sensitivity.” Nature Genetics 48 (12): 1581–86. https://doi.org/10.1038/ng.3703.
Jaganathan, Kishore, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F. McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A. Kosmicki, et al. 2019. “[SpliceAI] Predicting Splicing from Primary Sequence with Deep Learning.” Cell 176 (3): 535–548.e24. https://doi.org/10.1016/j.cell.2018.12.015.
Jagota, Milind, Chengzhong Ye, Carlos Albors, Ruchir Rastogi, Antoine Koehl, Nilah Ioannidis, and Yun S. Song. 2023. “Cross-Protein Transfer Learning Substantially Improves Disease Variant Prediction.” Genome Biology 24 (1): 182. https://doi.org/10.1186/s13059-023-03024-6.
Jain, Sarthak, and Byron C. Wallace. 2019. “Attention Is Not Explanation.” arXiv. https://doi.org/10.48550/arXiv.1902.10186.
Jawahar, Ganesh, Benoît Sagot, and Djamé Seddah. 2019. “What Does BERT Learn about the Structure of Language?” In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy.
Ji, Yanrong, Zhihan Zhou, Han Liu, and Ramana V Davuluri. 2021. DNABERT: Pre-Trained Bidirectional Encoder Representations from Transformers Model for DNA-Language in Genome.” Bioinformatics 37 (15): 2112–20. https://doi.org/10.1093/bioinformatics/btab083.
Jiang, Tao, Yongzhuang Liu, Yue Jiang, Junyi Li, Yan Gao, Zhe Cui, Yadong Liu, Bo Liu, and Yadong Wang. 2020. “Long-Read-Based Human Genomic Structural Variation Detection with cuteSV.” Genome Biology 21 (1): 189. https://doi.org/10.1186/s13059-020-02107-y.
Jr, Timothy F. Truong, and Tristan Bepler. 2023. PoET: A Generative Model of Protein Families as Sequences-of-Sequences.” arXiv. https://doi.org/10.48550/arXiv.2306.06156.
Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “[AlphaFold2] Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89. https://doi.org/10.1038/s41586-021-03819-2.
Jurenaite, Neringa, Daniel León-Periñán, Veronika Donath, Sunna Torge, and René Jäkel. 2024. SetQuence & SetOmic: Deep Set Transformers for Whole Genome and Exome Tumour Analysis.” BioSystems 235 (January): 105095. https://doi.org/10.1016/j.biosystems.2023.105095.
Kagda, Meenakshi S., Bonita Lam, Casey Litton, Corinn Small, Cricket A. Sloan, Emma Spragins, Forrest Tanaka, et al. 2025. “Data Navigation on the ENCODE Portal.” Nature Communications 16 (1): 9592. https://doi.org/10.1038/s41467-025-64343-9.
Kalvari, Ioanna, Eric P Nawrocki, Nancy Ontiveros-Palacios, Joanna Argasinska, Kevin Lamkiewicz, Manja Marz, Sam Griffiths-Jones, et al. 2021. “Rfam 14: Expanded Coverage of Metagenomic, Viral and microRNA Families.” Nucleic Acids Research 49 (D1): D192–200. https://doi.org/10.1093/nar/gkaa1047.
Kaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. “Scaling Laws for Neural Language Models.” arXiv. https://doi.org/10.48550/arXiv.2001.08361.
Karczewski, Konrad J., Laurent C. Francioli, Grace Tiao, Beryl B. Cummings, Jessica Alföldi, Qingbo Wang, Ryan L. Collins, et al. 2020. “The Mutational Constraint Spectrum Quantified from Variation in 141,456 Humans.” Nature 581 (7809): 434–43. https://doi.org/10.1038/s41586-020-2308-7.
Karollus, Alexander, Thomas Mauermeier, and Julien Gagneur. 2023. “Current Sequence-Based Models Capture Gene Expression Determinants in Promoters but Mostly Ignore Distal Enhancers.” Genome Biology 24 (1): 56. https://doi.org/10.1186/s13059-023-02899-9.
Katzman, Jared L., Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. 2018. DeepSurv: Personalized Treatment Recommender System Using a Cox Proportional Hazards Deep Neural Network.” BMC Medical Research Methodology 18 (1): 24. https://doi.org/10.1186/s12874-018-0482-1.
Kaye, Jane, Edgar A. Whitley, David Lund, Michael Morrison, Harriet Teare, and Karen Melham. 2014. “Dynamic Consent: A Patient Interface for Twenty-First Century Research Networks.” European Journal of Human Genetics 23 (2): 141–46. https://doi.org/10.1038/ejhg.2014.71.
Kelley, David R. 2020. “[Basenji2] Cross-Species Regulatory Sequence Activity Prediction.” PLOS Computational Biology 16 (7): e1008050. https://doi.org/10.1371/journal.pcbi.1008050.
Kelley, David R., Yakir A. Reshef, Maxwell Bileschi, David Belanger, Cory Y. McLean, and Jasper Snoek. 2018. “[Basenji] Sequential Regulatory Activity Prediction Across Chromosomes with Convolutional Neural Networks.” Genome Research 28 (5): 739–50. https://doi.org/10.1101/gr.227819.117.
Kelley, David R., Jasper Snoek, and John L. Rinn. 2016. “Basset: Learning the Regulatory Code of the Accessible Genome with Deep Convolutional Neural Networks.” Genome Research 26 (7): 990–99. https://doi.org/10.1101/gr.200535.115.
Khera, Amit V., and Sekar Kathiresan. 2017. “Genetics of Coronary Artery Disease: Discovery, Biology and Clinical Translation.” Nature Reviews Genetics 18 (6): 331–44. https://doi.org/10.1038/nrg.2016.160.
Kichaev, Gleb, Megan Roytman, Ruth Johnson, Eleazar Eskin, Sara Lindström, Peter Kraft, and Bogdan Pasaniuc. 2017. “Improved Methods for Multi-Trait Fine Mapping of Pleiotropic Risk Loci.” Bioinformatics 33 (2): 248–55. https://doi.org/10.1093/bioinformatics/btw615.
Kipf, Thomas N., and Max Welling. 2017. “Semi-Supervised Classification with Graph Convolutional Networks.” arXiv. https://doi.org/10.48550/arXiv.1609.02907.
Kircher, Martin, Daniela M. Witten, Preti Jain, Brian J. O’Roak, Gregory M. Cooper, and Jay Shendure. 2014. “A General Framework for Estimating the Relative Pathogenicity of Human Genetic Variants.” Nature Genetics 46 (3): 310–15. https://doi.org/10.1038/ng.2892.
Kıcıman, Emre, Robert Ness, Amit Sharma, and Chenhao Tan. 2024. “Causal Reasoning and Large Language Models: Opening a New Frontier for Causality.” arXiv. https://doi.org/10.48550/arXiv.2305.00050.
Kong, Augustine, Michael L. Frigge, Gisli Masson, Soren Besenbacher, Patrick Sulem, Gisli Magnusson, Sigurjon A. Gudjonsson, et al. 2012. “Rate of de Novo Mutations and the Importance of Father’s Age to Disease Risk.” Nature 488 (7412): 471–75. https://doi.org/10.1038/nature11396.
Krusche, Peter, Len Trigg, Paul C. Boutros, Christopher E. Mason, Francisco M. De La Vega, Benjamin L. Moore, Mar Gonzalez-Porta, et al. 2019. “Best Practices for Benchmarking Germline Small Variant Calls in Human Genomes.” Nature Biotechnology 37 (5): 555–60. https://doi.org/10.1038/s41587-019-0054-x.
Kryshtafovych, Andriy, Torsten Schwede, Maya Topf, Krzysztof Fidelis, and John Moult. 2021. “Critical Assessment of Methods of Protein Structure Prediction (CASP)—Round XIV.” Proteins: Structure, Function, and Bioinformatics 89 (12): 1607–17. https://doi.org/10.1002/prot.26237.
Kuchenbaecker, Karoline B., John L. Hopper, Daniel R. Barnes, Kelly-Anne Phillips, Thea M. Mooij, Marie-José Roos-Blom, Sarah Jervis, et al. 2017. “Risks of Breast, Ovarian, and Contralateral Breast Cancer for BRCA1 and BRCA2 Mutation Carriers.” JAMA 317 (23): 2402–16. https://doi.org/10.1001/jama.2017.7112.
Kulakovskiy, Ivan V., Ilya E. Vorontsov, Ivan S. Yevshin, Ruslan N. Sharipov, Alla D. Fedorova, Eugene I. Rumynskiy, Yulia A. Medvedeva, et al. 2018. HOCOMOCO: Towards a Complete Collection of Transcription Factor Binding Models for Human and Mouse via Large-Scale ChIP-Seq Analysis.” Nucleic Acids Research 46 (D1): D252–59. https://doi.org/10.1093/nar/gkx1106.
Kulmanov, Maxat, Francisco J. Guzmán-Vega, Paula Duek Roggli, Lydie Lane, Stefan T. Arold, and Robert Hoehndorf. 2024. “Protein Function Prediction as Approximate Semantic Entailment.” Nature Machine Intelligence 6 (2): 220–28. https://doi.org/10.1038/s42256-024-00795-w.
Kundaje, Anshul, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, et al. 2015. “Integrative Analysis of 111 Reference Human Epigenomes.” Nature 518 (7539): 317–30. https://doi.org/10.1038/nature14248.
Kurki, Mitja I., Juha Karjalainen, Priit Palta, Timo P. Sipilä, Kati Kristiansson, Kati M. Donner, Mary P. Reeve, et al. 2023. FinnGen Provides Genetic Insights from a Well-Phenotyped Isolated Population.” Nature 613 (7944): 508–18. https://doi.org/10.1038/s41586-022-05473-8.
La Manno, Gioele, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, et al. 2018. RNA Velocity of Single Cells.” Nature 560 (7719): 494–98. https://doi.org/10.1038/s41586-018-0414-6.
Laird, Nan M., and Christoph Lange. 2011. The Fundamentals of Modern Statistical Genetics. New York: Springer. https://doi.org/10.1007/978-1-4419-7338-2.
Lambert, Samuel A, Gad Abraham, and Michael Inouye. 2019. “Towards Clinical Utility of Polygenic Risk Scores.” Human Molecular Genetics 28 (R2): R133–42. https://doi.org/10.1093/hmg/ddz187.
Lambert, Samuel A., Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, et al. 2021. “The Polygenic Score Catalog as an Open Database for Reproducibility and Systematic Evaluation.” Nature Genetics 53 (4): 420–25. https://doi.org/10.1038/s41588-021-00783-5.
Landrum, Melissa J, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, et al. 2018. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence.” Nucleic Acids Research 46 (D1): D1062–67. https://doi.org/10.1093/nar/gkx1153.
Larson, Adam G., Daniel Elnatan, Madeline M. Keenen, Michael J. Trnka, Jonathan B. Johnston, Alma L. Burlingame, David A. Agard, Sy Redding, and Geeta J. Narlikar. 2017. “Liquid Droplet Formation by HP1α Suggests a Role for Phase Separation in Heterochromatin.” Nature 547 (7662): 236–40. https://doi.org/10.1038/nature22822.
Lawlor, Debbie A., Roger M. Harbord, Jonathan A. C. Sterne, Nic Timpson, and George Davey Smith. 2008. “Mendelian Randomization: Using Genes as Instruments for Making Causal Inferences in Epidemiology.” Statistics in Medicine 27 (8): 1133–63. https://doi.org/10.1002/sim.3034.
Leacy, Finbarr P., and Elizabeth A. Stuart. 2013. “On the Joint Use of Propensity and Prognostic Scores in Estimation of the Average Treatment Effect on the Treated: A Simulation Study.” Statistics in Medicine 33 (20): 3488–508. https://doi.org/10.1002/sim.6030.
Lee, Ingoo, Zachary S. Wallace, Yuqi Wang, Sungjoon Park, Hojung Nam, Amit R. Majithia, and Trey Ideker. 2025. “[G2PT] A Genotype-Phenotype Transformer to Assess and Explain Polygenic Risk.” bioRxiv. https://doi.org/10.1101/2024.10.23.619940.
Lee, Jinhyuk, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining.” Bioinformatics 36 (4): 1234–40. https://doi.org/10.1093/bioinformatics/btz682.
Li, Heng. 2013. “Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM.” arXiv. https://doi.org/10.48550/arXiv.1303.3997.
———. 2014. “Towards Better Understanding of Artifacts in Variant Calling from High-Coverage Samples.” Bioinformatics 30 (20): 2843–51. https://doi.org/10.1093/bioinformatics/btu356.
———. 2018. “Minimap2: Pairwise Alignment for Nucleotide Sequences.” Bioinformatics 34 (18): 3094–3100. https://doi.org/10.1093/bioinformatics/bty191.
Li, Qimai, Zhichao Han, and Xiao-Ming Wu. 2018. “Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning.” In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, 3538–45. AAAI’18/IAAI’18/EAAI’18. New Orleans, Louisiana, USA: AAAI Press.
Li, Sizhen, Saeed Moayedpour, Ruijiang Li, Michael Bailey, Saleh Riahi, Milad Miladi, Jacob Miner, et al. 2023. CodonBERT: Large Language Models for mRNA Design and Optimization.” bioRxiv. https://doi.org/10.1101/2023.09.09.556981.
Li, Weizhong, and Adam Godzik. 2006. “Cd-Hit: A Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences.” Bioinformatics 22 (13): 1658–59. https://doi.org/10.1093/bioinformatics/btl158.
Li, Xiao, Jie Ma, Ling Leng, Mingfei Han, Mansheng Li, Fuchu He, and Yunping Zhu. 2022. MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis.” Frontiers in Genetics 13 (February). https://doi.org/10.3389/fgene.2022.806842.
Li, Zehui, Vallijah Subasri, Yifei Shen, Dongsheng Li, Yiren Zhao, Guy-Bart Stan, and Caihua Shan. 2025. “Omni-DNA: A Unified Genomic Foundation Model for Cross-Modal and Multi-Task Learning.” arXiv. https://doi.org/10.48550/arXiv.2502.03499.
Liao, Wen-Wei, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, Glenn Hickey, Shuangjia Lu, et al. 2023. “A Draft Human Pangenome Reference.” Nature 617 (7960): 312–24. https://doi.org/10.1038/s41586-023-05896-x.
Lieberman-Aiden, Erez, Nynke L. van Berkum, Louise Williams, Noam Kaplan, Peter J. Sabo, Michael O. Dorschner, Job Dekker, et al. 2009. “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome.” Science 326 (5950): 289–93. https://doi.org/10.1126/science.1181369.
Lin, Tsung-Yi, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. “Focal Loss for Dense Object Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (02): 318–27. https://doi.org/10.1109/TPAMI.2018.2858826.
Lin, Weining, David Miller, Zhonghui Gu, and Christine Orengo. 2025. GOBeacon: An Ensemble Model for Protein Function Prediction Enhanced by Contrastive Learning.” Protein Science 34 (7): e70182. https://doi.org/10.1002/pro.70182.
Lin, Zeming, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, et al. 2022. “[ESM-2] Language Models of Protein Sequences at the Scale of Evolution Enable Accurate Structure Prediction.” bioRxiv. https://doi.org/10.1101/2022.07.20.500902.
Lin, Zeming, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, et al. 2023. “Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model.” Science 379 (6637): 1123–30. https://doi.org/10.1126/science.ade2574.
Linder, Johannes, Divyanshi Srivastava, Han Yuan, Vikram Agarwal, and David R. Kelley. 2025. “[Borzoi] Predicting RNA-Seq Coverage from DNA Sequence as a Unifying Model of Gene Regulation.” Nature Genetics 57 (4): 949–61. https://doi.org/10.1038/s41588-024-02053-6.
Lipsitch, Marc, Eric Tchetgen Tchetgen, and Ted Cohen. 2010. “Negative Controls: A Tool for Detecting Confounding and Bias in Observational Studies.” Epidemiology 21 (3): 383. https://doi.org/10.1097/EDE.0b013e3181d61eeb.
Liu, Zicheng, Siyuan Li, Zhiyuan Chen, Fang Wu, Chang Yu, Qirong Yang, Yucheng Guo, Yujie Yang, Xiaoming Zhang, and Stan Z. Li. 2025. “Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification.” arXiv. https://doi.org/10.48550/arXiv.2502.07299.
Logsdon, Glennis A., Mitchell R. Vollger, and Evan E. Eichler. 2020. “Long-Read Human Genome Sequencing and Its Applications.” Nature Reviews Genetics 21 (10): 597–614. https://doi.org/10.1038/s41576-020-0236-x.
Loh, Po-Ru, Petr Danecek, Pier Francesco Palamara, Christian Fuchsberger, Yakir A Reshef, Hilary K Finucane, Sebastian Schoenherr, et al. 2016. “Reference-Based Phasing Using the Haplotype Reference Consortium Panel.” Nature Genetics 48 (11): 1443–48. https://doi.org/10.1038/ng.3679.
Loshchilov, Ilya, and Frank Hutter. 2019. “Decoupled Weight Decay Regularization.” arXiv. https://doi.org/10.48550/arXiv.1711.05101.
Lupiáñez, Darío G., Katerina Kraft, Verena Heinrich, Peter Krawitz, Francesco Brancati, Eva Klopocki, Denise Horn, et al. 2015. “Disruptions of Topological Chromatin Domains Cause Pathogenic Rewiring of Gene-Enhancer Interactions.” Cell 161 (5): 1012–25. https://doi.org/10.1016/j.cell.2015.04.004.
Lynch, Thomas J., Daphne W. Bell, Raffaella Sordella, Sarada Gurubhagavatula, Ross A. Okimoto, Brian W. Brannigan, Patricia L. Harris, et al. 2004. “Activating Mutations in the Epidermal Growth Factor Receptor Underlying Responsiveness of NonSmall-Cell Lung Cancer to Gefitinib.” New England Journal of Medicine 350 (21): 2129–39. https://doi.org/10.1056/NEJMoa040938.
Madani, Ali, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos, et al. 2023. “Large Language Models Generate Functional Protein Sequences Across Diverse Families.” Nature Biotechnology 41 (8): 1099–1106. https://doi.org/10.1038/s41587-022-01618-2.
Madhu, Hiren, João Felipe Rocha, Tinglin Huang, Siddharth Viswanath, Smita Krishnaswamy, and Rex Ying. 2025. HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data.” arXiv. https://doi.org/10.48550/arXiv.2506.11152.
Mallal, Simon, Elizabeth Phillips, Giampiero Carosi, Jean-Michel Molina, Cassy Workman, Janez Tomažič, Eva Jägel-Guedes, et al. 2008. HLA-B*5701 Screening for Hypersensitivity to Abacavir.” New England Journal of Medicine 358 (6): 568–79. https://doi.org/10.1056/NEJMoa0706135.
Maller, Julian B., Gilean McVean, Jake Byrnes, Damjan Vukcevic, Kimmo Palin, Zhan Su, Joanna M. M. Howson, et al. 2012. “Bayesian Refinement of Association Signals for 14 Loci in 3 Common Diseases.” Nature Genetics 44 (12): 1294–1301. https://doi.org/10.1038/ng.2435.
Manolio, Teri A., Francis S. Collins, Nancy J. Cox, David B. Goldstein, Lucia A. Hindorff, David J. Hunter, Mark I. McCarthy, et al. 2009. “Finding the Missing Heritability of Complex Diseases.” Nature 461 (7265): 747–53. https://doi.org/10.1038/nature08494.
Manzo, Gaetano, Kathryn Borkowski, and Ivan Ovcharenko. 2025. “Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants.” bioRxiv: The Preprint Server for Biology, June, 2025.05.19.654920. https://doi.org/10.1101/2025.05.19.654920.
Marees, Andries T., Hilde de Kluiver, Sven Stringer, Florence Vorspan, Emmanuel Curis, Cynthia Marie-Claire, and Eske M. Derks. 2018. “[GWAS] A Tutorial on Conducting Genome-Wide Association Studies: Quality Control and Statistical Analysis.” International Journal of Methods in Psychiatric Research 27 (2): e1608. https://doi.org/10.1002/mpr.1608.
Marin, Frederikke Isa, Felix Teufel, Marc Horlacher, Dennis Madsen, Dennis Pultz, Ole Winther, and Wouter Boomsma. 2024. BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks.” arXiv. https://doi.org/10.48550/arXiv.2311.12570.
Márquez-Luna, Carla, Po-Ru Loh, South Asian Type 2 Diabetes (SAT2D) Consortium, The SIGMA Type 2 Diabetes Consortium, and Alkes L. Price. 2017. “Multiethnic Polygenic Risk Scores Improve Risk Prediction in Diverse Populations.” Genetic Epidemiology 41 (8): 811–23. https://doi.org/10.1002/gepi.22083.
Martin, Alicia R., Masahiro Kanai, Yoichiro Kamatani, Yukinori Okada, Benjamin M. Neale, and Mark J. Daly. 2019. “Clinical Use of Current Polygenic Risk Scores May Exacerbate Health Disparities.” Nature Genetics 51 (4): 584–91. https://doi.org/10.1038/s41588-019-0379-x.
Mastropietro, Andrea, Gianluca De Carlo, and Aris Anagnostopoulos. 2023. XGDAG: Explainable Gene–Disease Associations via Graph Neural Networks.” Bioinformatics 39 (8): btad482. https://doi.org/10.1093/bioinformatics/btad482.
Maurano, Matthew T., Richard Humbert, Eric Rynes, Robert E. Thurman, Eric Haugen, Hao Wang, Alex P. Reynolds, et al. 2012. “Systematic Localization of Common Disease-Associated Variation in Regulatory DNA.” Science 337 (6099): 1190–95. https://doi.org/10.1126/science.1222794.
Mavaddat, Nasim, Kyriaki Michailidou, Joe Dennis, Michael Lush, Laura Fachal, Andrew Lee, Jonathan P. Tyrer, et al. 2019. “Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes.” The American Journal of Human Genetics 104 (1): 21–34. https://doi.org/10.1016/j.ajhg.2018.11.002.
McCloskey, Michael, and Neal Cohen. 1989. “Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem.” Psychology of Learning and Motivation 24 (January): 109–65. https://doi.org/10.1016/S0079-7421(08)60536-8.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. 2nd ed. Chapman; Hall/CRC.
Medvedev, Aleksandr, Karthik Viswanathan, Praveenkumar Kanithi, Kirill Vishniakov, Prateek Munjal, Clément Christophe, Marco AF Pimentel, Ronnie Rajan, and Shadab Khan. 2025. BioToken and BioFMBiologically-Informed Tokenization Enables Accurate and Efficient Genomic Foundation Models.” bioRxiv. https://doi.org/10.1101/2025.03.27.645711.
Meier, Joshua, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alexander Rives. 2021. “[ESM-1v] Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function.” bioRxiv. https://doi.org/10.1101/2021.07.09.450648.
Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. “Model Cards for Model Reporting.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–29. FAT* ’19. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3287560.3287596.
Morales, Joannella, Shashikant Pujar, Jane E. Loveland, Alex Astashyn, Ruth Bennett, Andrew Berry, Eric Cox, et al. 2022. “A Joint NCBI and EMBL-EBI Transcript Set for Clinical Genomics and Research.” Nature 604 (7905): 310–15. https://doi.org/10.1038/s41586-022-04558-8.
Morcos, Faruck, Andrea Pagnani, Bryan Lunt, Arianna Bertolino, Debora S. Marks, Chris Sander, Riccardo Zecchina, José N. Onuchic, Terence Hwa, and Martin Weigt. 2011. “Direct-Coupling Analysis of Residue Coevolution Captures Native Contacts Across Many Protein Families.” Proceedings of the National Academy of Sciences 108 (49): E1293–1301. https://doi.org/10.1073/pnas.1111471108.
Morris, Christopher, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. 2019. “Weisfeiler and Leman Go Neural: Higher-Order Graph Neural Networks.” In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 33:4602–9. AAAI’19/IAAI’19/EAAI’19. Honolulu, Hawaii, USA: AAAI Press. https://doi.org/10.1609/aaai.v33i01.33014602.
Mountjoy, Edward, Ellen M. Schmidt, Miguel Carmona, Jeremy Schwartzentruber, Gareth Peat, Alfredo Miranda, Luca Fumis, et al. 2021. “An Open Approach to Systematically Prioritize Causal Variants and Genes at All Published Human GWAS Trait-Associated Loci.” Nature Genetics 53 (11): 1527–33. https://doi.org/10.1038/s41588-021-00945-5.
Mukherjee, Sumit, Zachary R. McCaw, Jingwen Pei, Anna Merkoulovitch, Tom Soare, Raghav Tandon, David Amar, et al. 2024. EmbedGEM: A Framework to Evaluate the Utility of Embeddings for Genetic Discovery.” Bioinformatics Advances 4 (1). https://doi.org/10.1093/bioadv/vbae135.
NaderiAlizadeh, Navid, and Rohit Singh. 2025. “Aggregating Residue-Level Protein Language Model Embeddings with Optimal Transport.” Bioinformatics Advances 5 (1): vbaf060. https://doi.org/10.1093/bioadv/vbaf060.
Naghipourfar, Mohsen, Siyu Chen, Mathew K. Howard, Christian B. Macdonald, Ali Saberi, Timo Hagen, Mohammad R. K. Mofrad, Willow Coyote-Maestas, and Hani Goodarzi. 2024. “[cdsFM - EnCodon/DeCodon] A Suite of Foundation Models Captures the Contextual Interplay Between Codons.” bioRxiv. https://doi.org/10.1101/2024.10.10.617568.
Nagpal, Chirag, Xinyu Li, and Artur Dubrawski. 2021. “Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data With Competing Risks.” IEEE Journal of Biomedical and Health Informatics 25 (8): 3163–75. https://doi.org/10.1109/JBHI.2021.3052441.
Nelson, Matthew R., Hannah Tipney, Jeffery L. Painter, Judong Shen, Paola Nicoletti, Yufeng Shen, Aris Floratos, et al. 2015. “The Support of Human Genetic Evidence for Approved Drug Indications.” Nature Genetics 47 (8): 856–60. https://doi.org/10.1038/ng.3314.
Ng, Pauline C., and Steven Henikoff. 2003. SIFT: Predicting Amino Acid Changes That Affect Protein Function.” Nucleic Acids Research 31 (13): 3812–14. https://doi.org/10.1093/nar/gkg509.
Nguengang Wakap, Stéphanie, Deborah M. Lambert, Annie Olry, Charlotte Rodwell, Charlotte Gueydan, Valérie Lanneau, Daniel Murphy, Yann Le Cam, and Ana Rath. 2019. “Estimating Cumulative Point Prevalence of Rare Diseases: Analysis of the Orphanet Database.” European Journal of Human Genetics 28 (2): 165–73. https://doi.org/10.1038/s41431-019-0508-0.
Nguyen, Eric, Michael Poli, Matthew G. Durrant, Brian Kang, Dhruva Katrekar, David B. Li, Liam J. Bartie, et al. 2024. “Sequence Modeling and Design from Molecular to Genome Scale with Evo.” Science 386 (6723): eado9336. https://doi.org/10.1126/science.ado9336.
Nguyen, Eric, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, et al. 2023. HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution.” arXiv. https://doi.org/10.48550/arXiv.2306.15794.
Nielsen, Rasmus, Joshua S. Paul, Anders Albrechtsen, and Yun S. Song. 2011. “Genotype and SNP Calling from Next-Generation Sequencing Data.” Nature Reviews. Genetics 12 (6): 443–51. https://doi.org/10.1038/nrg2986.
Nijkamp, Erik, Jeffrey A. Ruffolo, Eli N. Weinstein, Nikhil Naik, and Ali Madani. 2023. ProGen2: Exploring the Boundaries of Protein Language Models.” Cell Systems 14 (11): 968–978.e3. https://doi.org/10.1016/j.cels.2023.10.002.
Nofziger, Charity, Amy J. Turner, Katrin Sangkuhl, Michelle Whirl-Carrillo, José A. G. Agúndez, John L. Black, Henry M. Dunnenberger, et al. 2019. PharmVar GeneFocus: CYP2D6.” Clinical Pharmacology & Therapeutics 107 (1): 154–70. https://doi.org/10.1002/cpt.1643.
Notin, Pascal, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado, Aidan Gomez, Debora S. Marks, and Yarin Gal. 2022. “Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-Time Retrieval.” arXiv. https://doi.org/10.48550/arXiv.2205.13760.
Notin, Pascal, Aaron Kollasch, Daniel Ritter, Lood van Niekerk, Steffanie Paul, Han Spinner, Nathan Rollins, et al. 2023. ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design.” Advances in Neural Information Processing Systems 36 (December): 64331–79.
Nurk, Sergey, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V. Bzikadze, Alla Mikheenko, Mitchell R. Vollger, et al. 2022. “The Complete Sequence of a Human Genome.” Science 376 (6588): 44–53. https://doi.org/10.1126/science.abj6987.
O’Connell, Jared, Deepti Gurdasani, Olivier Delaneau, Nicola Pirastu, Sheila Ulivi, Massimiliano Cocca, Michela Traglia, et al. 2014. “A General Approach for Haplotype Phasing Across the Full Spectrum of Relatedness.” PLOS Genetics 10 (4): e1004234. https://doi.org/10.1371/journal.pgen.1004234.
O’Leary, Nuala A., Mathew W. Wright, J. Rodney Brister, Stacy Ciufo, Diana Haddad, Rich McVeigh, Bhanu Rajput, et al. 2016. “Reference Sequence (RefSeq) Database at NCBI: Current Status, Taxonomic Expansion, and Functional Annotation.” Nucleic Acids Research 44 (D1): D733–45. https://doi.org/10.1093/nar/gkv1189.
Ochoa, David, Andrew Hercules, Miguel Carmona, Daniel Suveges, James Baker, Cinzia Malangone, Irene Lopez, et al. 2023. “The Next-Generation Open Targets Platform: Reimagined, Redesigned, Rebuilt.” Nucleic Acids Research 51 (D1): D1353–59. https://doi.org/10.1093/nar/gkac1046.
Oono, Kenta, and Taiji Suzuki. 2020. “Graph Neural Networks Exponentially Lose Expressive Power for Node Classification.” In.
Oord, Aaron van den, Yazhe Li, and Oriol Vinyals. 2019. “Representation Learning with Contrastive Predictive Coding.” arXiv. https://doi.org/10.48550/arXiv.1807.03748.
Orchard, Sandra, Mais Ammari, Bruno Aranda, Lionel Breuza, Leonardo Briganti, Fiona Broackes-Carter, Nancy H. Campbell, et al. 2014. “The MIntAct Project—IntAct as a Common Curation Platform for 11 Molecular Interaction Databases.” Nucleic Acids Research 42 (D1): D358–63. https://doi.org/10.1093/nar/gkt1115.
Orenbuch, Rose, Courtney A. Shearer, Aaron W. Kollasch, Aviv D. Spinner, Thomas Hopf, Lood van Niekerk, Dinko Franceschi, Mafalda Dias, Jonathan Frazer, and Debora S. Marks. 2025. “[popEVE] Proteome-Wide Model for Human Disease Genetics.” Nature Genetics, November, 1–10. https://doi.org/10.1038/s41588-025-02400-1.
Oughtred, Rose, Jennifer Rust, Christie Chang, Bobby-Joe Breitkreutz, Chris Stark, Andrew Willems, Lorrie Boucher, et al. 2020. “The BioGRID Database: A Comprehensive Biomedical Resource of Curated Protein, Genetic, and Chemical Interactions.” Protein Science 30 (1): 187–200. https://doi.org/10.1002/pro.3978.
Outeiral, Carlos, and Charlotte M. Deane. 2024. “Codon Language Embeddings Provide Strong Signals for Use in Protein Engineering.” Nature Machine Intelligence 6 (2): 170–79. https://doi.org/10.1038/s42256-024-00791-0.
Paass, Gerhard, and Sven Giesselbach. 2023. Foundation Models for Natural Language Processing: Pre-Trained Language Models Integrating Media. Cham: Springer. https://doi.org/10.1007/978-3-031-23190-2.
PacificBiosciences/Pbsv.” 2025. PacBio.
Parasuraman, Raja, and Dietrich H. Manzey. 2010. “Complacency and Bias in Human Use of Automation: An Attentional Integration.” Human Factors 52 (3): 381–410. https://doi.org/10.1177/0018720810376055.
Patterson, Nick, Alkes L. Price, and David Reich. 2006. “Population Structure and Eigenanalysis.” PLOS Genetics 2 (12): e190. https://doi.org/10.1371/journal.pgen.0020190.
Pe’er, Itsik, Roman Yelensky, David Altshuler, and Mark J. Daly. 2008. “Estimation of the Multiple Testing Burden for Genomewide Association Studies of Nearly All Common Variants.” Genetic Epidemiology 32 (4): 381–85. https://doi.org/10.1002/gepi.20303.
Pearce, James D., Sara E. Simmonds, Gita Mahmoudabadi, Lakshmi Krishnan, Giovanni Palla, Ana-Maria Istrate, Alexander Tarashansky, et al. 2025. “[TranscriptFormer] Cross-Species Generative Cell Atlas Across 1.5 Billion Years of Evolution: The TranscriptFormer Single-Cell Model.” bioRxiv. https://doi.org/10.1101/2025.04.25.650731.
Pearl, Judea. 2009. Causality. Cambridge University Press.
Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why. Hachette Book Group.
Pejaver, Vikas, Alicia B. Byrne, Bing-Jian Feng, Kymberleigh A. Pagel, Sean D. Mooney, Rachel Karchin, Anne O’Donnell-Luria, et al. 2022. “Calibration of Computational Tools for Missense Variant Pathogenicity Classification and ClinGen Recommendations for PP3/BP4 Criteria.” American Journal of Human Genetics 109 (12): 2163–77. https://doi.org/10.1016/j.ajhg.2022.10.013.
Peters, Matthew E., Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. “Deep Contextualized Word Representations.” In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), edited by Marilyn Walker, Heng Ji, and Amanda Stent, 2227–37. New Orleans, Louisiana: Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202.
Piñero, Janet, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I Furlong. 2020. “The DisGeNET Knowledge Platform for Disease Genomics: 2019 Update.” Nucleic Acids Research 48 (D1): D845–55. https://doi.org/10.1093/nar/gkz1021.
Platt, John. 1999. “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.” Advances in Large Margin Classifiers, March.
Poli, Michael, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Re. 2023. “Hyena Hierarchy: Towards Larger Convolutional Language Models.” In Proceedings of the 40th International Conference on Machine Learning, 28043–78. PMLR.
Pollard, Katherine S., Melissa J. Hubisz, Kate R. Rosenbloom, and Adam Siepel. 2009. “Detection of Nonneutral Substitution Rates on Mammalian Phylogenies.” Genome Research 20 (1): 110–21. https://doi.org/10.1101/gr.097857.109.
Poplin, Ryan, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, et al. 2018. “[DeepVariant] A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks.” Nature Biotechnology 36 (10): 983–87. https://doi.org/10.1038/nbt.4235.
Press, Ofir, Noah A. Smith, and Mike Lewis. 2022. “Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.” arXiv. https://doi.org/10.48550/arXiv.2108.12409.
Price, Alkes L., Nick J. Patterson, Robert M. Plenge, Michael E. Weinblatt, Nancy A. Shadick, and David Reich. 2006. “Principal Components Analysis Corrects for Stratification in Genome-Wide Association Studies.” Nature Genetics 38 (8): 904–9. https://doi.org/10.1038/ng1847.
Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel A. R. Ferreira, David Bender, Julian Maller, et al. 2007. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses.” The American Journal of Human Genetics 81 (3): 559–75. https://doi.org/10.1086/519795.
Quan, Hude, Vijaya Sundararajan, Patricia Halfon, Andrew Fong, Bernard Burnand, Jean-Christophe Luthi, L. Duncan Saunders, Cynthia A. Beck, Thomas E. Feasby, and William A. Ghali. 2005. “Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data.” Medical Care 43 (11): 1130. https://doi.org/10.1097/01.mlr.0000182534.19832.83.
Quang, Daniel, Yifei Chen, and Xiaohui Xie. 2015. DANN: A Deep Learning Approach for Annotating the Pathogenicity of Genetic Variants.” Bioinformatics 31 (5): 761–63. https://doi.org/10.1093/bioinformatics/btu703.
Quang, Daniel, and Xiaohui Xie. 2016. DanQ: A Hybrid Convolutional and Recurrent Deep Neural Network for Quantifying the Function of DNA Sequences.” Nucleic Acids Research 44 (11): e107. https://doi.org/10.1093/nar/gkw226.
Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, et al. 2021. “Learning Transferable Visual Models From Natural Language Supervision.” In Proceedings of the 38th International Conference on Machine Learning, 8748–63.
Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2023. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” arXiv. https://doi.org/10.48550/arXiv.1910.10683.
Rakowski, Alexander, and Christoph Lippert. 2025. “[MIFM] Multiple Instance Fine-Mapping: Predicting Causal Regulatory Variants with a Deep Sequence Model.” medRxiv. https://doi.org/10.1101/2025.06.13.25329551.
Rao, Roshan, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, and Yun S. Song. 2019. “Evaluating Protein Transfer Learning with TAPE.” arXiv. https://doi.org/10.48550/arXiv.1906.08230.
Rao, Roshan, Joshua Meier, Tom Sercu, Sergey Ovchinnikov, and Alexander Rives. 2020. “Transformer Protein Language Models Are Unsupervised Structure Learners.” bioRxiv. https://doi.org/10.1101/2020.12.15.422761.
Rao, Suhas S. P., Su-Chen Huang, Brian Glenn St Hilaire, Jesse M. Engreitz, Elizabeth M. Perez, Kyong-Rim Kieffer-Kwon, Adrian L. Sanborn, et al. 2017. “Cohesin Loss Eliminates All Loop Domains.” Cell 171 (2): 305–320.e24. https://doi.org/10.1016/j.cell.2017.09.026.
Rao, Suhas S. P., Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, et al. 2014. “A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping.” Cell 159 (7): 1665–80. https://doi.org/10.1016/j.cell.2014.11.021.
RealTimeGenomics/Rtg-Core.” 2025. Real Time Genomics.
Regev, Aviv, Sarah A Teichmann, Eric S Lander, Ido Amit, Christophe Benoist, Ewan Birney, Bernd Bodenmiller, et al. 2017. “The Human Cell Atlas.” Edited by Thomas R Gingeras. eLife 6 (December): e27041. https://doi.org/10.7554/eLife.27041.
Rehm, Heidi L., Jonathan S. Berg, Lisa D. Brooks, Carlos D. Bustamante, James P. Evans, Melissa J. Landrum, David H. Ledbetter, et al. 2015. ClinGenThe Clinical Genome Resource.” New England Journal of Medicine 372 (23): 2235–42. https://doi.org/10.1056/NEJMsr1406261.
Relling, Mary V., Teri E. Klein, Roseann S. Gammal, Michelle Whirl-Carrillo, James M. Hoffman, and Kelly E. Caudle. 2019. “The Clinical Pharmacogenetics Implementation Consortium: 10 Years Later.” Clinical Pharmacology & Therapeutics 107 (1): 171–75. https://doi.org/10.1002/cpt.1651.
Rentzsch, Philipp, Max Schubach, Jay Shendure, and Martin Kircher. 2021. CADD-Splice—Improving Genome-Wide Variant Effect Prediction Using Deep Learning-Derived Splice Scores.” Genome Medicine 13 (1): 31. https://doi.org/10.1186/s13073-021-00835-9.
Rentzsch, Philipp, Daniela Witten, Gregory M Cooper, Jay Shendure, and Martin Kircher. 2019. CADD: Predicting the Deleteriousness of Variants Throughout the Human Genome.” Nucleic Acids Research 47 (D1): D886–94. https://doi.org/10.1093/nar/gky1016.
Richards, Sue, Nazneen Aziz, Sherri Bale, David Bick, Soma Das, Julie Gastier-Foster, Wayne W. Grody, et al. 2015. “Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.” Genetics in Medicine 17 (5): 405–24. https://doi.org/10.1038/gim.2015.30.
Richardson, Peter, Ivan Griffin, Catherine Tucker, Dan Smith, Olly Oechsle, Anne Phelan, Michael Rawling, Edward Savory, and Justin Stebbing. 2020. “Baricitinib as Potential Treatment for 2019-nCoV Acute Respiratory Disease.” The Lancet 395 (10223): e30–31. https://doi.org/10.1016/S0140-6736(20)30304-4.
Rieke, Nicola, Jonny Hancox, Wenqi Li, Fausto Milletarì, Holger R. Roth, Shadi Albarqouni, Spyridon Bakas, et al. 2020. “The Future of Digital Health with Federated Learning.” Npj Digital Medicine 3 (1): 119. https://doi.org/10.1038/s41746-020-00323-1.
Risch, Neil, and Kathleen Merikangas. 1996. “The Future of Genetic Studies of Complex Human Diseases.” Science 273 (5281): 1516–17. https://doi.org/10.1126/science.273.5281.1516.
Rives, Alexander, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, et al. 2021. “[ESM-1b] Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences.” Proceedings of the National Academy of Sciences of the United States of America 118 (15): e2016239118. https://doi.org/10.1073/pnas.2016239118.
Robinson, James, Dominic J Barker, Xenia Georgiou, Michael A Cooper, Paul Flicek, and Steven G E Marsh. 2020. IPD-IMGT/HLA Database.” Nucleic Acids Research 48 (D1): D948–55. https://doi.org/10.1093/nar/gkz950.
Rogers, Anna, Olga Kovaleva, and Anna Rumshisky. 2021. “A Primer in BERTology: What We Know About How BERT Works.” Transactions of the Association for Computational Linguistics 8 (January): 842–66. https://doi.org/10.1162/tacl_a_00349.
Rost, Burkhard. 1999. “Twilight Zone of Protein Sequence Alignments.” Protein Engineering 12 (2): 85–94. https://doi.org/10.1093/protein/12.2.85.
Ruan, Yunfeng, Yen-Feng Lin, Yen-Chen Anne Feng, Chia-Yen Chen, Max Lam, Zhenglin Guo, Lin He, et al. 2022. “Improving Polygenic Prediction in Ancestrally Diverse Populations.” Nature Genetics 54 (5): 573–80. https://doi.org/10.1038/s41588-022-01054-7.
Rubin, Alan F., Hannah Gelman, Nathan Lucas, Sandra M. Bajjalieh, Anthony T. Papenfuss, Terence P. Speed, and Douglas M. Fowler. 2017. “A Statistical Framework for Analyzing Deep Mutational Scanning Data.” Genome Biology 18 (1): 150. https://doi.org/10.1186/s13059-017-1272-5.
Rubinacci, Simone, Diogo M. Ribeiro, Robin J. Hofmeister, and Olivier Delaneau. 2021. “Efficient Phasing and Imputation of Low-Coverage Sequencing Data Using Large Reference Panels.” Nature Genetics 53 (1): 120–26. https://doi.org/10.1038/s41588-020-00756-0.
Saadat, Ali, and Jacques Fellay. 2024. DNA Language Model and Interpretable Graph Neural Network Identify Genes and Pathways Involved in Rare Diseases.” In Proceedings of the 1st Workshop on Language + Molecules (L+M 2024), 103–15. https://doi.org/10.18653/v1/2024.langmol-1.13.
Sainz, Oscar, Jon Campos, Iker García-Ferrero, Julen Etxaniz, Oier Lopez de Lacalle, and Eneko Agirre. 2023. NLP Evaluation in Trouble: On the Need to Measure LLM Data Contamination for Each Benchmark.” In Findings of the Association for Computational Linguistics: EMNLP 2023, edited by Houda Bouamor, Juan Pino, and Kalika Bali, 10776–87. Singapore: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.722.
Sakaue, Saori, Saisriram Gurajala, Michelle Curtis, Yang Luo, Wanson Choi, Kazuyoshi Ishigaki, Joyce B. Kang, et al. 2023. “Tutorial: A Statistical Genetics Guide to Identifying HLA Alleles Driving Complex Disease.” Nature Protocols 18 (9): 2625–41. https://doi.org/10.1038/s41596-023-00853-4.
Samek, Wojciech, Gregoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller. 2019. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Vol. 11700. LNAI. Cham: Springer. https://doi.org/10.1007/978-3-030-28954-6.
Sample, Paul J., Ban Wang, David W. Reid, Vlad Presnyak, Iain J. McFadyen, David R. Morris, and Georg Seelig. 2019. “Human 5′ UTR Design and Variant Effect Prediction from a Massively Parallel Translation Assay.” Nature Biotechnology 37 (7): 803–9. https://doi.org/10.1038/s41587-019-0164-5.
Sanabria, Melissa, Jonas Hirsch, Pierre M. Joubert, and Anna R. Poetsch. 2024. “[GROVER] DNA Language Model GROVER Learns Sequence Context in the Human Genome.” Nature Machine Intelligence 6 (8): 911–23. https://doi.org/10.1038/s42256-024-00872-0.
Sanborn, Adrian L., Suhas S. P. Rao, Su-Chen Huang, Neva C. Durand, Miriam H. Huntley, Andrew I. Jewett, Ivan D. Bochkov, et al. 2015. “Chromatin Extrusion Explains Key Features of Loop and Domain Formation in Wild-Type and Engineered Genomes.” Proceedings of the National Academy of Sciences 112 (47): E6456–65. https://doi.org/10.1073/pnas.1518552112.
Sanderson, Theo, Maxwell L Bileschi, David Belanger, and Lucy J Colwell. 2023. ProteInfer, Deep Neural Networks for Protein Functional Inference.” Edited by Volker Dötsch and Max V Staller. eLife 12 (February): e80942. https://doi.org/10.7554/eLife.80942.
Sangkuhl, Katrin, Michelle Whirl-Carrillo, Ryan M. Whaley, Mark Woon, Adam Lavertu, Russ B. Altman, Lester Carter, Anurag Verma, Marylyn D. Ritchie, and Teri E. Klein. 2019. “Pharmacogenomics Clinical Annotation Tool (PharmCAT).” Clinical Pharmacology & Therapeutics 107 (1): 203–10. https://doi.org/10.1002/cpt.1568.
Sarkisyan, Karen S., Dmitry A. Bolotin, Margarita V. Meer, Dinara R. Usmanova, Alexander S. Mishin, George V. Sharonov, Dmitry N. Ivankov, et al. 2016. “Local Fitness Landscape of the Green Fluorescent Protein.” Nature 533 (7603): 397–401. https://doi.org/10.1038/nature17995.
Schiff, Yair, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and Volodymyr Kuleshov. 2024. “Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling.” arXiv. https://doi.org/10.48550/arXiv.2403.03234.
Schmidt, Amand F., Chris Finan, Maria Gordillo-Marañón, Folkert W. Asselbergs, Daniel F. Freitag, Riyaz S. Patel, Benoît Tyl, et al. 2020. “Genetic Drug Target Validation Using Mendelian Randomisation.” Nature Communications 11 (1): 3255. https://doi.org/10.1038/s41467-020-16969-0.
Schubach, Max, Thorben Maass, Lusiné Nazaretyan, Sebastian Röner, and Martin Kircher. 2024. CADD V1.7: Using Protein Language Models, Regulatory CNNs and Other Nucleotide-Level Scores to Improve Genome-Wide Variant Predictions.” Nucleic Acids Research 52 (D1): D1143–54. https://doi.org/10.1093/nar/gkad989.
Shafin, Kishwar, Trevor Pesout, Pi-Chuan Chang, Maria Nattestad, Alexey Kolesnikov, Sidharth Goel, Gunjan Baid, et al. 2021. “Haplotype-Aware Variant Calling with PEPPER-Margin-DeepVariant Enables High Accuracy in Nanopore Long-Reads.” Nature Methods 18 (11): 1322–32. https://doi.org/10.1038/s41592-021-01299-w.
Shalem, Ophir, Neville E. Sanjana, Ella Hartenian, Xi Shi, David A. Scott, Tarjei S. Mikkelsen, Dirk Heckl, et al. 2014. “Genome-Scale CRISPR-Cas9 Knockout Screening in Human Cells.” Science 343 (6166): 84–87. https://doi.org/10.1126/science.1247005.
Sherry, S. T., M.-H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. 2001. dbSNP: The NCBI Database of Genetic Variation.” Nucleic Acids Research 29 (1): 308–11. https://doi.org/10.1093/nar/29.1.308.
Shevlane, Toby. 2022. “Structured Access: An Emerging Paradigm for Safe AI Deployment.” arXiv. https://doi.org/10.48550/arXiv.2201.05159.
Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. 2017. “Learning Important Features Through Propagating Activation Differences.” In Proceedings of the 34th International Conference on Machine Learning, 3145–53. PMLR.
Shrikumar, Avanti, Katherine Tian, Žiga Avsec, Anna Shcherbina, Abhimanyu Banerjee, Mahfuza Sharmin, Surag Nair, and Anshul Kundaje. 2018. “Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) Version 0.5.6.5.” arXiv. https://doi.org/10.48550/arXiv.1811.00416.
Siepel, Adam, Gill Bejerano, Jakob S. Pedersen, Angie S. Hinrichs, Minmei Hou, Kate Rosenbloom, Hiram Clawson, et al. 2005. “[PhastCons] Evolutionarily Conserved Elements in Vertebrate, Insect, Worm, and Yeast Genomes.” Genome Research 15 (8): 1034–50. https://doi.org/10.1101/gr.3715005.
Singh, Jaswinder, Jack Hanson, Kuldip Paliwal, and Yaoqi Zhou. 2019. RNA Secondary Structure Prediction Using an Ensemble of Two-Dimensional Deep Neural Networks and Transfer Learning.” Nature Communications 10 (1): 5407. https://doi.org/10.1038/s41467-019-13395-9.
Sirugo, Giorgio, Scott M. Williams, and Sarah A. Tishkoff. 2019. “The Missing Diversity in Human Genetic Studies.” Cell 177 (1): 26–31. https://doi.org/10.1016/j.cell.2019.02.048.
Smolka, Moritz, Luis F. Paulin, Christopher M. Grochowski, Dominic W. Horner, Medhat Mahmoud, Sairam Behera, Ester Kalef-Ezra, et al. 2024. “Detection of Mosaic and Population-Level Structural Variants with Sniffles2.” Nature Biotechnology 42 (10): 1571–80. https://doi.org/10.1038/s41587-023-02024-y.
Snell, Jake, Kevin Swersky, and Richard Zemel. 2017. “Prototypical Networks for Few-Shot Learning.” In Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc.
Sohail, Mashaal, María J. Palma-Martínez, Amanda Y. Chong, Consuelo D. Quinto-Cortés, Carmina Barberena-Jonas, Santiago G. Medina-Muñoz, Aaron Ragsdale, et al. 2023. “Mexican Biobank Advances Population and Medical Genomics of Diverse Ancestries.” Nature 622 (7984): 775–83. https://doi.org/10.1038/s41586-023-06560-0.
Soice, Emily H., Rafael Rocha, Kimberlee Cordova, Michael Specter, and Kevin M. Esvelt. 2023. “Can Large Language Models Democratize Access to Dual-Use Biotechnology?” arXiv. https://doi.org/10.48550/arXiv.2306.03809.
Sollis, Elliot, Abayomi Mosaku, Ala Abid, Annalisa Buniello, Maria Cerezo, Laurent Gil, Tudor Groza, et al. 2023. “The NHGRI-EBI GWAS Catalog: Knowledgebase and Deposition Resource.” Nucleic Acids Research 51 (D1): D977–85. https://doi.org/10.1093/nar/gkac1010.
Somani, Ayush, Alexander Horsch, and Dilip K. Prasad. 2023. Interpretability in Deep Learning. Cham: Springer. https://doi.org/10.1007/978-3-031-20639-9.
Song, Li, Gali Bai, X. Shirley Liu, Bo Li, and Heng Li. 2022. T1K: Efficient and Accurate KIR and HLA Genotyping with Next-Generation Sequencing Data.” bioRxiv. https://doi.org/10.1101/2022.10.26.513955.
Spitale, Robert C., Ryan A. Flynn, Qiangfeng Cliff Zhang, Pete Crisalli, Byron Lee, Jong-Wha Jung, Hannes Y. Kuchelmeister, et al. 2015. “Structural Imprints in Vivo Decode RNA Regulatory Mechanisms.” Nature 519 (7544): 486–90. https://doi.org/10.1038/nature14263.
Stebbing, Justin, Venkatesh Krishnan, Stephanie de Bono, Silvia Ottaviani, Giacomo Casalini, Peter J. Richardson, Vanessa Monteil, et al. 2020. “Mechanism of Baricitinib Supports Artificial Intelligence‐predicted Testing in COVID‐19 Patients.” EMBO Molecular Medicine 12 (8): EMMM202012697. https://doi.org/10.15252/emmm.202012697.
Steinegger, Martin, Milot Mirdita, and Johannes Söding. 2019. “Protein-Level Assembly Increases Protein Sequence Recovery from Metagenomic Samples Manyfold.” Nature Methods 16 (7): 603–6. https://doi.org/10.1038/s41592-019-0437-4.
Steinegger, Martin, and Johannes Söding. 2017. MMseqs2 Enables Sensitive Protein Sequence Searching for the Analysis of Massive Data Sets.” Nature Biotechnology 35 (11): 1026–28. https://doi.org/10.1038/nbt.3988.
Stenson, Peter D., Matthew Mort, Edward V. Ball, Katy Evans, Matthew Hayden, Sally Heywood, Michelle Hussain, Andrew D. Phillips, and David N. Cooper. 2017. “The Human Gene Mutation Database: Towards a Comprehensive Repository of Inherited Mutation Data for Medical Research, Genetic Diagnosis and Next-Generation Sequencing Studies.” Human Genetics 136 (6): 665–77. https://doi.org/10.1007/s00439-017-1779-6.
Steyerberg, Ewout W. 2019. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd ed. Cham: Springer. https://doi.org/10.1007/978-3-030-16399-0.
Su, Chang, Zichun Xu, Xinning Shan, Biao Cai, Hongyu Zhao, and Jingfei Zhang. 2023. “Cell-Type-Specific Co-Expression Inference from Single Cell RNA-Sequencing Data.” Nature Communications 14 (1): 4846. https://doi.org/10.1038/s41467-023-40503-7.
Su, Jianlin, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. 2024. RoFormer: Enhanced Transformer with Rotary Position Embedding.” Neurocomputing 568 (February): 127063. https://doi.org/10.1016/j.neucom.2023.127063.
Sudlow, Cathie, John Gallacher, Naomi Allen, Valerie Beral, Paul Burton, John Danesh, Paul Downey, et al. 2015. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age.” PLOS Medicine 12 (3): e1001779. https://doi.org/10.1371/journal.pmed.1001779.
Sullivan, Patrick F., Jennifer R. S. Meadows, Steven Gazal, BaDoi N. Phan, Xue Li, Diane P. Genereux, Michael X. Dong, et al. 2023. “Leveraging Base-Pair Mammalian Constraint to Understand Genetic Variation and Human Disease.” Science 380 (6643): eabn2937. https://doi.org/10.1126/science.abn2937.
Sundaram, Laksshman, Hong Gao, Samskruthi Reddy Padigepati, Jeremy F. McRae, Yanjun Li, Jack A. Kosmicki, Nondas Fritzilas, et al. 2018. “Predicting the Clinical Impact of Human Mutation with Deep Neural Networks.” Nature Genetics 50 (8): 1161–70. https://doi.org/10.1038/s41588-018-0167-z.
Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. 2017. “Axiomatic Attribution for Deep Networks.” In Proceedings of the 34th International Conference on Machine Learning, 3319–28. PMLR.
Supreme Court of the United States. 2013. “Assoc. For Molecular Pathology v. Myriad Genetics, Inc., 569 U.S. 576 (2013).”
Suzek, Baris E., Hongzhan Huang, Peter McGarvey, Raja Mazumder, and Cathy H. Wu. 2007. UniRef: Comprehensive and Non-Redundant UniProt Reference Clusters.” Bioinformatics 23 (10): 1282–88. https://doi.org/10.1093/bioinformatics/btm098.
Svensson, Valentine. 2020. “Droplet scRNA-Seq Is Not Zero-Inflated.” Nature Biotechnology 38 (2): 147–50. https://doi.org/10.1038/s41587-019-0379-5.
Swanson, Kyle, Howard Chang, and James Zou. 2022. “Predicting Immune Escape with Pretrained Protein Language Model Embeddings.” In Proceedings of the 17th Machine Learning in Computational Biology Meeting, 110–30. PMLR.
Swartout, William R., and Johanna D. Moore. 1993. “Explanation in Second Generation Expert Systems.” In Second Generation Expert Systems, 543–85. Springer.
Szklarczyk, Damian, Rebecca Kirsch, Mikaela Koutrouli, Katerina Nastou, Farrokh Mehryary, Radja Hachilif, Annika L Gable, et al. 2023. “The STRING Database in 2023: Protein–Protein Association Networks and Functional Enrichment Analyses for Any Sequenced Genome of Interest.” Nucleic Acids Research 51 (D1): D638–46. https://doi.org/10.1093/nar/gkac1000.
Tabula Sapiens Consortium, The. 2022. “The Tabula Sapiens: A Multiple-Organ, Single-Cell Transcriptomic Atlas of Humans.” Science 376 (6594): eabl4896. https://doi.org/10.1126/science.abl4896.
Taliun, Daniel, Daniel N. Harris, Michael D. Kessler, Jedidiah Carlson, Zachary A. Szpiech, Raul Torres, Sarah A. Gagliano Taliun, et al. 2021. “Sequencing of 53,831 Diverse Genomes from the NHLBI TOPMed Program.” Nature 590 (7845): 290–99. https://doi.org/10.1038/s41586-021-03205-y.
Tan, Jimin, Nina Shenker-Tauris, Javier Rodriguez-Hernaez, Eric Wang, Theodore Sakellaropoulos, Francesco Boccalatte, Palaniraja Thandapani, et al. 2023. “Cell-Type-Specific Prediction of 3D Chromatin Organization Enables High-Throughput in Silico Genetic Screening.” Nature Biotechnology 41 (8): 1140–50. https://doi.org/10.1038/s41587-022-01612-8.
Tang, Fuchou, Catalin Barbacioru, Yangzhou Wang, Ellen Nordman, Clarence Lee, Nanlan Xu, Xiaohui Wang, et al. 2009. mRNA-Seq Whole-Transcriptome Analysis of a Single Cell.” Nature Methods 6 (5): 377–82. https://doi.org/10.1038/nmeth.1315.
Tanigawa, Yosuke, Junyang Qian, Guhan Venkataraman, Johanne Marie Justesen, Ruilin Li, Robert Tibshirani, Trevor Hastie, and Manuel A. Rivas. 2022. “Significant Sparse Polygenic Risk Scores Across 813 Traits in UK Biobank.” PLOS Genetics 18 (3): e1010105. https://doi.org/10.1371/journal.pgen.1010105.
Tate, John G, Sally Bamford, Harry C Jubb, Zbyslaw Sondka, David M Beare, Nidhi Bindal, Harry Boutselakis, et al. 2019. COSMIC: The Catalogue Of Somatic Mutations In Cancer.” Nucleic Acids Research 47 (D1): D941–47. https://doi.org/10.1093/nar/gky1015.
Tavtigian, Sean V., Marc S. Greenblatt, Steven M. Harrison, Robert L. Nussbaum, Snehit A. Prabhu, Kenneth M. Boucher, and Leslie G. Biesecker. 2018. “Modeling the ACMG/AMP Variant Classification Guidelines as a Bayesian Classification Framework.” Genetics in Medicine 20 (9): 1054–60. https://doi.org/10.1038/gim.2017.210.
The UniProt Consortium. 2023. UniProt: The Universal Protein Knowledgebase in 2023.” Nucleic Acids Research 51 (D1): D523–31. https://doi.org/10.1093/nar/gkac1052.
Theodoris, Christina V., Ling Xiao, Anant Chopra, Mark D. Chaffin, Zeina R. Al Sayed, Matthew C. Hill, Helene Mantineo, et al. 2023. “[Geneformer] Transfer Learning Enables Predictions in Network Biology.” Nature 618 (7965): 616–24. https://doi.org/10.1038/s41586-023-06139-9.
Tipirneni, Sindhu, and Chandan K. Reddy. 2022. “Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate Clinical Time-Series.” ACM Trans. Knowl. Discov. Data 16 (6): 105:1–17. https://doi.org/10.1145/3516367.
Torkamani, Ali, Nathan E. Wineinger, and Eric J. Topol. 2018. “The Personal and Clinical Utility of Polygenic Risk Scores.” Nature Reviews Genetics 19 (9): 581–90. https://doi.org/10.1038/s41576-018-0018-x.
Trop, Evan, Yair Schiff, Edgar Mariano Marroquin, Chia Hsiang Kao, Aaron Gokaslan, McKinley Polen, Mingyi Shao, et al. 2024. “The Genomics Long-Range Benchmark: Advancing DNA Language Models,” October.
U.S. Food and Drug Administration. 2021. “Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan.”
US Congress. 2008. “Genetic Information Nondiscrimination Act of 2008.”
Van der Auwera, Geraldine A., Mauricio O. Carneiro, Christopher Hartl, Ryan Poplin, Guillermo del Angel, Ami Levy-Moonshine, Tadeusz Jordan, et al. 2018. “From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline.” Current Protocols in Bioinformatics 43 (1): 11.10.1–33. https://doi.org/10.1002/0471250953.bi1110s43.
Vapnik, Vladimir. 1998. Statistical Learning Theory. Wiley.
Varadi, Mihaly, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, et al. 2022. AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models.” Nucleic Acids Research 50 (D1): D439–44. https://doi.org/10.1093/nar/gkab1061.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. “Attention Is All You Need.” arXiv. https://doi.org/10.48550/arXiv.1706.03762.
Veličković, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. “Graph Attention Networks.” arXiv. https://doi.org/10.48550/arXiv.1710.10903.
Venkatesan, Kavitha, Jean-François Rual, Alexei Vazquez, Ulrich Stelzl, Irma Lemmens, Tomoko Hirozane-Kishikawa, Tong Hao, et al. 2008. “An Empirical Framework for Binary Interactome Mapping.” Nature Methods 6 (1): 83–90. https://doi.org/10.1038/nmeth.1280.
Vickers, Andrew J., and Elena B. Elkin. 2006. “Decision Curve Analysis: A Novel Method for Evaluating Prediction Models.” Medical Decision Making 26 (6): 565–74. https://doi.org/10.1177/0272989X06295361.
Vilhjálmsson, Bjarni J., Jian Yang, Hilary K. Finucane, Alexander Gusev, Sara Lindström, Stephan Ripke, Giulio Genovese, et al. 2015. “Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.” American Journal of Human Genetics 97 (4): 576–92. https://doi.org/10.1016/j.ajhg.2015.09.001.
Vishniakov, Kirill, Boulbaba Ben Amor, Engin Tekin, Nancy A. ElNaker, Karthik Viswanathan, Aleksandr Medvedev, Aahan Singh, et al. 2025. “Gene42: Long-Range Genomic Foundation Model With Dense Attention.” arXiv. https://doi.org/10.48550/arXiv.2503.16565.
Visscher, Peter M., William G. Hill, and Naomi R. Wray. 2008. “Heritability in the Genomics Era — Concepts and Misconceptions.” Nature Reviews Genetics 9 (4): 255–66. https://doi.org/10.1038/nrg2322.
Võsa, Urmo, Annique Claringbould, Harm-Jan Westra, Marc Jan Bonder, Patrick Deelen, Biao Zeng, Holger Kirsten, et al. 2021. “Large-Scale Cis- and Trans-eQTL Analyses Identify Thousands of Genetic Loci and Polygenic Scores That Regulate Blood Gene Expression.” Nature Genetics 53 (9): 1300–1310. https://doi.org/10.1038/s41588-021-00913-z.
Wang, Dequan, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. 2021. “Tent: Fully Test-Time Adaptation by Entropy Minimization.” arXiv. https://doi.org/10.48550/arXiv.2006.10726.
Wang, Gao, Abhishek Sarkar, Peter Carbonetto, and Matthew Stephens. 2020. “A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping.” Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (5): 1273–1300. https://doi.org/10.1111/rssb.12388.
Wang, Sinong, Belinda Z. Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. “Linformer: Self-Attention with Linear Complexity.” arXiv. https://doi.org/10.48550/arXiv.2006.04768.
Wang, Yihui, Zhiyuan Cai, Qian Zeng, Yihang Gao, Jiarui Ouyang, Yingxue Xu, Shu Yang, et al. 2025. “Genomic Touchstone: Benchmarking Genomic Language Models in the Context of the Central Dogma.” bioRxiv. https://doi.org/10.1101/2025.06.25.661622.
Wang, Zirui, Zihang Dai, Barnabas Poczos, and Jaime Carbonell. 2018. “Characterizing and Avoiding Negative Transfer.” In, 11293–302.
Watson, Joseph L., David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, et al. 2023. “De Novo Design of Protein Structure and Function with RFdiffusion.” Nature 620 (7976): 1089–1100. https://doi.org/10.1038/s41586-023-06415-8.
Wei, Jason, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, et al. 2022. “Emergent Abilities of Large Language Models.” arXiv. https://doi.org/10.48550/arXiv.2206.07682.
Weissbrod, Omer, Farhad Hormozdiari, Christian Benner, Ran Cui, Jacob Ulirsch, Steven Gazal, Armin P. Schoech, et al. 2020. “Functionally Informed Fine-Mapping and Polygenic Localization of Complex Trait Heritability.” Nature Genetics 52 (12): 1355–63. https://doi.org/10.1038/s41588-020-00735-5.
Wenger, Aaron M., Paul Peluso, William J. Rowell, Pi-Chuan Chang, Richard J. Hall, Gregory T. Concepcion, Jana Ebler, et al. 2019. “Accurate Circular Consensus Long-Read Sequencing Improves Variant Detection and Assembly of a Human Genome.” Nature Biotechnology 37 (10): 1155–62. https://doi.org/10.1038/s41587-019-0217-9.
Whirl-Carrillo, M, E M McDonagh, J M Hebert, L Gong, K Sangkuhl, C F Thorn, R B Altman, and T E Klein. 2012. “Pharmacogenomics Knowledge for Personalized Medicine.” Clinical Pharmacology & Therapeutics 92 (4): 414–17. https://doi.org/10.1038/clpt.2012.96.
Wu, Yang, Zhili Zheng, Loic Thibaut2, Michael E. Goddard, Naomi R. Wray, Peter M. Visscher, and Jian Zeng. 2024. “Genome-Wide Fine-Mapping Improves Identification of Causal Variants.” Research Square, August, rs.3.rs–4759390. https://doi.org/10.21203/rs.3.rs-4759390/v1.
Xiong, Ruibin, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tieyan Liu. 2020. “On Layer Normalization in the Transformer Architecture.” In Proceedings of the 37th International Conference on Machine Learning, 10524–33. PMLR.
Xu, Keyulu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. “How Powerful Are Graph Neural Networks?” arXiv. https://doi.org/10.48550/arXiv.1810.00826.
Yan, Lulu, Dongyan Zhang, and Xiaoqiang Sun. 2026. “Decoding Cell State Transitions Driven by Dynamic Cell–Cell Communication in Spatial Transcriptomics.” Nature Computational Science, January, 1–15. https://doi.org/10.1038/s43588-025-00934-2.
Yang, Fan, Wenchuan Wang, Fang Wang, Yuan Fang, Duyu Tang, Junzhou Huang, Hui Lu, and Jianhua Yao. 2022. scBERT as a Large-Scale Pretrained Deep Language Model for Cell Type Annotation of Single-Cell RNA-Seq Data.” Nature Machine Intelligence 4 (10): 852–66. https://doi.org/10.1038/s42256-022-00534-z.
Yang, Jian, Beben Benyamin, Brian P. McEvoy, Scott Gordon, Anjali K. Henders, Dale R. Nyholt, Pamela A. Madden, et al. 2010. “Common SNPs Explain a Large Proportion of the Heritability for Human Height.” Nature Genetics 42 (7): 565–69. https://doi.org/10.1038/ng.608.
Yang, Zhilin, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2020. XLNet: Generalized Autoregressive Pretraining for Language Understanding.” arXiv. https://doi.org/10.48550/arXiv.1906.08237.
Yengo, Loïc, Sailaja Vedantam, Eirini Marouli, Julia Sidorenko, Eric Bartell, Saori Sakaue, Marielisa Graff, et al. 2022. “A Saturated Map of Common Genetic Variants Associated with Human Height.” Nature 610 (7933): 704–12. https://doi.org/10.1038/s41586-022-05275-y.
Yeo, Gene, and Christopher B. Burge. 2004. “Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals.” Journal of Computational Biology 11 (2-3): 377–94. https://doi.org/10.1089/1066527041410418.
Ying, Chengxuan, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. “Do Transformers Really Perform Bad for Graph Representation?” arXiv. https://doi.org/10.48550/arXiv.2106.05234.
Yu, Ying, Yuanbang Mai, Yuanting Zheng, and Leming Shi. 2024. “Assessing and Mitigating Batch Effects in Large-Scale Omics Studies.” Genome Biology 25 (1): 254. https://doi.org/10.1186/s13059-024-03401-9.
Yun, Taedong, Justin Cosentino, Babak Behsaz, Zachary R. McCaw, Davin Hill, Robert Luben, Dongbing Lai, et al. 2023. “[REGLE] Unsupervised Representation Learning Improves Genomic Discovery and Risk Prediction for Respiratory and Circulatory Functions and Diseases.” medRxiv. https://doi.org/10.1101/2023.04.28.23289285.
Yun, Taedong, Helen Li, Pi-Chuan Chang, Michael F Lin, Andrew Carroll, and Cory Y McLean. 2021. “Accurate, Scalable Cohort Variant Calls Using DeepVariant and GLnexus.” Bioinformatics 36 (24): 5582–89. https://doi.org/10.1093/bioinformatics/btaa1081.
Zanger, Ulrich M., and Matthias Schwab. 2013. “Cytochrome P450 Enzymes in Drug Metabolism: Regulation of Gene Expression, Enzyme Activities, and Impact of Genetic Variation.” Pharmacology & Therapeutics 138 (1): 103–41. https://doi.org/10.1016/j.pharmthera.2012.12.007.
Zeng, Tony, and Yang I. Li. 2022. “Predicting RNA Splicing from DNA Sequence Using Pangolin.” Genome Biology 23 (1): 103. https://doi.org/10.1186/s13059-022-02664-4.
Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. “Understanding Deep Learning (Still) Requires Rethinking Generalization.” Commun. ACM 64 (3): 107–15. https://doi.org/10.1145/3446776.
Zhang, Qiang, Keyang Ding, Tianwen Lyv, Xinda Wang, Qingyu Yin, Yiwen Zhang, Jing Yu, et al. 2024. “Scientific Large Language Models: A Survey on Biological & Chemical Domains.” arXiv. https://doi.org/10.48550/arXiv.2401.14656.
Zhang, Yu, Rachel Patton McCord, Yu-Jui Ho, Brian R. Laber, Diana S. Aber, Jungha Kim, Xiaowen Zhang, and Tom Misteli. 2012. “Spatial Organization of the Mouse Genome and Its Role in Recurrent Chromosomal Translocations.” Cell 148 (5): 908–21. https://doi.org/10.1016/j.cell.2012.02.002.
Zhao, Yanlong, Yixiao Chen, Jiawen Du, Jun Wen, Quan Sun, Ren Wang, and Can Chen. 2025. “Dual-Route Embedding-Aware Graph Neural Networks for Drug Repositioning.” Briefings in Bioinformatics 26 (5): bbaf555. https://doi.org/10.1093/bib/bbaf555.
Zheng, Rongbin, Changxin Wan, Shenglin Mei, Qian Qin, Qiu Wu, Hanfei Sun, Chen-Hao Chen, et al. 2019. “Cistrome Data Browser: Expanded Datasets and New Tools for Gene Regulatory Analysis.” Nucleic Acids Research 47 (D1): D729–35. https://doi.org/10.1093/nar/gky1094.
Zheng, Zhenxian, Shumin Li, Junhao Su, Amy Wing-Sze Leung, Tak-Wah Lam, and Ruibang Luo. 2022. “Symphonizing Pileup and Full-Alignment for Deep Learning-Based Long-Read Variant Calling.” Nature Computational Science 2 (12): 797–803. https://doi.org/10.1038/s43588-022-00387-x.
Zhou, Jian. 2022. “Sequence-Based Modeling of Three-Dimensional Genome Architecture from Kilobase to Chromosome Scale.” Nature Genetics 54 (5): 725–34. https://doi.org/10.1038/s41588-022-01065-4.
Zhou, Jian, Chandra L. Theesfeld, Kevin Yao, Kathleen M. Chen, Aaron K. Wong, and Olga G. Troyanskaya. 2018. “[Expecto] Deep Learning Sequence-Based Ab Initio Prediction of Variant Effects on Expression and Disease Risk.” Nature Genetics 50 (8): 1171–79. https://doi.org/10.1038/s41588-018-0160-6.
Zhou, Jian, and Olga G. Troyanskaya. 2015. “[DeepSEA] Predicting Effects of Noncoding Variants with Deep Learning–Based Sequence Model.” Nature Methods 12 (10): 931–34. https://doi.org/10.1038/nmeth.3547.
Zhou, Zhihan, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and Han Liu. 2024. DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome.” arXiv. https://doi.org/10.48550/arXiv.2306.15006.
Zhou, Zhihan, Weimin Wu, Harrison Ho, Jiayi Wang, Lizhen Shi, Ramana V Davuluri, Zhong Wang, and Han Liu. 2025. DNABERT-S: Pioneering Species Differentiation with Species-Aware DNA Embeddings.” Bioinformatics 41 (Supplement_1): i255–64. https://doi.org/10.1093/bioinformatics/btaf188.
Zhu, Ligeng, Zhijian Liu, and Song Han. 2019. “Deep Leakage from Gradients.” arXiv. https://doi.org/10.48550/arXiv.1906.08935.
Zhu, Xiao, Chenchen Qin, Fang Wang, Fan Yang, Bing He, Yu Zhao, and Jianhua Yao. 2024. CD-GPT: A Biological Foundation Model Bridging the Gap Between Molecular Sequences Through Central Dogma.” bioRxiv. https://doi.org/10.1101/2024.06.24.600337.
Zitnik, Marinka, Monica Agrawal, and Jure Leskovec. 2018. “Modeling Polypharmacy Side Effects with Graph Convolutional Networks.” Bioinformatics 34 (13): i457–66. https://doi.org/10.1093/bioinformatics/bty294.
Zook, Justin M., Jennifer McDaniel, Nathan D. Olson, Justin Wagner, Hemang Parikh, Haynes Heaton, Sean A. Irvine, et al. 2019. “An Open Resource for Accurately Benchmarking Small Variant and Reference Calls.” Nature Biotechnology 37 (5): 561–66. https://doi.org/10.1038/s41587-019-0074-6.
Zvyagin, Maxim, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, et al. 2022. GenSLMs: Genome-Scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics.” bioRxiv. https://doi.org/10.1101/2022.10.10.511571.