29 Rare Disease Diagnosis
Twenty-five thousand variants. One diagnosis. Where do you start?
Prerequisites: This chapter assumes familiarity with variant effect prediction (Chapter 18), uncertainty quantification (Section 24.3), and basic concepts of Mendelian inheritance. Readers should understand how foundation models like AlphaMissense and Enformer generate variant scores.
Learning Objectives: After completing this chapter, you will be able to:
- Describe the variant prioritization funnel and identify where foundation models contribute most
- Explain how computational predictions map to ACMG-AMP evidence categories and strengths
- Analyze how family structure (trios, segregation, phasing) enhances variant interpretation
- Distinguish germline from somatic variant interpretation contexts
- Evaluate when functional validation is needed to resolve variants of uncertain significance
Estimated reading time: 45 minutes
A four-year-old presents with developmental delay, hypotonia, and seizures that began at eighteen months. Standard metabolic testing reveals nothing. A gene panel for epilepsy returns negative. The neurologist orders whole-exome sequencing, which identifies 23,847 single nucleotide variants and 1,203 small insertions or deletions compared to the reference genome. Somewhere in this list of approximately 25,000 variants may lie the molecular explanation for this child’s condition, like searching for a single typo in a 3-million-page document when you do not know which word was misspelled or even what language you are looking for. The clinical team must reduce this number to a handful of candidates for expert review, ideally to a single variant or gene that explains the phenotype and guides management. This is the diagnostic odyssey: the gap between sequencing a genome and understanding what it means for a patient.
This scenario plays out thousands of times daily across clinical laboratories worldwide. Rare diseases collectively affect approximately 300 million people globally, yet individual conditions range from tens of thousands of patients to fewer than a dozen known cases worldwide (Nguengang Wakap et al. 2019). Over 7,000 rare diseases have been characterized, the majority following Mendelian inheritance patterns where single genes exert large effects (Amberger et al. 2015). For these patients, identifying the causal variant can end years of uncertainty, enable accurate genetic counseling for families, and increasingly guide targeted therapies. The technical capacity to sequence genomes has advanced enormously; the interpretive bottleneck has not kept pace. Variant interpretation remains largely manual, relying on clinical geneticists and laboratory directors who cannot scale to meet demand.
For ML readers: Diagnostic yield is the percentage of tested patients who receive a molecular diagnosis:
Typical yields by test type:
| Test Type | Diagnostic Yield | Best For |
|---|---|---|
| Single-gene testing | Variable | Known familial variant |
| Gene panel (targeted) | 15-30% | Well-defined phenotypes |
| Whole-exome sequencing | 25-40% | Undiagnosed, broad differential |
| Whole-genome sequencing | 30-50% | WES-negative cases, suspected noncoding |
| Trio WES/WGS | +10-15% over singleton | Sporadic cases, de novo |
Why yield matters:
- Defines the gap between sequencing capability and interpretation
- Higher yield = more patients ending their “diagnostic odyssey”
- Foundation models aim to increase yield by improving variant prioritization
What limits yield:
- Technical: Variants in regions hard to sequence (repeats, GC-rich)
- Biological: Novel genes not yet associated with disease
- Interpretive: True causal variants classified as VUS
- Phenotypic: Atypical presentations not matching known gene-disease associations
The opportunity for ML: Current yields mean 50-70% of patients leave without answers. Improved computational prioritization, better noncoding variant interpretation, and phenotype-matching algorithms can all increase diagnostic yield.
Foundation models offer new tools for this interpretive challenge. As detailed in Chapter 18, models like AlphaMissense provide proteome-wide estimates of missense pathogenicity, while regulatory models like Enformer predict variant effects on gene expression across tissues. These computational predictions become one line of evidence within structured interpretation frameworks. Foundation model outputs integrate into clinical variant interpretation workflows: from initial prioritization that reduces 25,000 variants to dozens, through ACMG-AMP evidence classification that structures expert review, to family-based analysis that leverages inheritance patterns, and laboratory validation that confirms computational predictions. The goal is not prediction for its own sake but actionable clinical insight: which variant explains this patient’s disease, and what should we do about it?
29.1 Variant Prioritization Funnel
Clinical variant interpretation operates through progressive filtering, narrowing tens of thousands of candidates to a manageable set for expert review. Each filtering step applies different types of evidence, and foundation models contribute at multiple stages.
Before reading on, consider: if you had 25,000 variants to evaluate and could only deeply analyze 10, what types of filters would you apply first? What information would help you eliminate the largest number of candidates quickly while minimizing the risk of discarding the causal variant?
29.1.1 Quality and Technical Filters
The first filter removes variants that are likely technical artifacts rather than true biological variation. Sequencing depth below 20x, strand bias exceeding established thresholds, and clustering of variants in repetitive regions all raise suspicion of false positives. The 20x depth threshold exists because variant calling requires sufficient reads to distinguish true heterozygous variants (expected ~50% alternate allele frequency) from sequencing errors (typically <1% per position); below this threshold, stochastic sampling fluctuations make reliable genotyping impossible. Strand bias indicates that a variant appears predominantly on reads from one DNA strand, suggesting it arose from damage or amplification artifacts during library preparation rather than existing in the original genomic DNA. Variant clustering in repetitive regions reflects the fundamental challenge of short-read alignment: when a read could map to multiple genomic locations, misalignment creates apparent variants that do not exist in the sample. Variant calling pipelines like GATK and DeepVariant (see Section 1.8) produce quality scores that guide this initial triage. As discussed in Section 24.3, these confidence estimates require careful calibration; systematic miscalibration in specific genomic contexts propagates directly into interpretation, creating blind spots where uncertain calls masquerade as confident ones or vice versa. Variants failing quality thresholds are removed before any biological interpretation begins.
For trio analysis (proband plus both parents), Mendelian inheritance consistency provides an additional quality check. A variant called heterozygous in the child should appear in at least one parent unless it arose de novo. Widespread Mendelian inconsistencies indicate sample swaps, contamination, or systematic calling errors that must be resolved before interpretation proceeds.
For ML/computational readers: Trio analysis sequences a patient (proband) plus both parents:
Key terminology:
- Proband: The affected individual being evaluated
- Trio: Proband + mother + father
- De novo variant: Present in proband but absent in both parents (arose as new mutation)
- Segregation: Whether a variant “travels with” disease in a family
Why trios dramatically improve interpretation:
| Scenario | Without Parents | With Trio |
|---|---|---|
| De novo identification | Cannot detect | Immediately identified |
| Compound heterozygosity | Must infer | Directly observed (which parent contributed each allele) |
| Phasing | Requires population inference | Direct observation |
| Quality control | Limited | Mendelian consistency checks |
De novo mutations are particularly informative:
- Most rare disease variants are inherited, but ~1-2 variants per genome are de novo
- For dominant conditions, de novo variants in relevant genes are strong evidence for causality
- De novo variants have not been subjected to selection in previous generations
Compound heterozygosity: In recessive disease, affected individuals have two pathogenic alleles. Trio analysis reveals whether two variants are:
- In trans (one from each parent) → both copies disrupted → likely causal
- In cis (both from same parent) → one functional copy remains → unlikely to cause recessive disease
29.1.2 Population Frequency Filters
Variants common in the general population are unlikely to cause rare, severe disease. If a variant appears in 1% of gnomAD individuals, it cannot plausibly explain a condition affecting one in 100,000 people under a dominant model. Frequency thresholds depend on inheritance mode and disease prevalence: dominant conditions with complete penetrance require extremely rare variants (often absent from population databases), while recessive conditions can tolerate higher carrier frequencies.
A variant with 0.1% population frequency might be far too common for a dominant condition affecting 1 in 100,000 individuals, yet entirely plausible as a carrier allele for a recessive condition affecting 1 in 10,000. The “rare” threshold is not absolute but depends on the disease model, expected penetrance, and population prevalence. Always calculate expected allele frequency from disease prevalence rather than applying uniform cutoffs.
The Genome Aggregation Database (gnomAD) provides allele frequencies across over 800,000 individuals from diverse ancestries (see Section 2.2.3) (Karczewski et al. 2020). Applying a frequency threshold of 0.01% for dominant conditions and 1% for recessive carriers typically removes 95% or more of variants from consideration. Ancestry-matched frequencies matter: a variant rare in European populations may be common in African or East Asian populations, and global frequency alone can be misleading.
29.1.3 Consequence and Gene Filters
Predicted functional consequence shapes prioritization. Loss-of-function variants (frameshift, nonsense, canonical splice site) in genes intolerant to haploinsufficiency receive immediate attention. Missense variants require additional assessment, as most are benign. Intronic and intergenic variants have historically been deprioritized, though foundation models are beginning to identify functional noncoding variants with greater precision (see Section 17.2 for regulatory models and Section 17.2.3 for variant effect prediction in noncoding regions).
Gene-level filters incorporate prior knowledge. Curated gene panels for specific phenotypes (such as the PanelApp epilepsy panel or cardiomyopathy panel) restrict analysis to genes with established disease associations. For undiagnosed cases without clear phenotype match, broader approaches may include all OMIM disease genes or genes with high constraint (low observed/expected loss-of-function ratios in gnomAD).
Gene-disease validity scores from ClinGen and Gene2Phenotype are aggregated in the Open Targets Platform, enabling systematic filtering of candidate genes by strength of prior evidence (Ochoa et al. 2023). When combined with foundation model variant scores, these validity assessments help prioritize variants in genes with established disease mechanisms over variants in genes where the disease association remains uncertain.
For rare disease diagnosis, restricting analysis to genes with “Definitive” or “Strong” ClinGen validity can dramatically reduce the candidate list while focusing on genes most likely to be clinically actionable. Conversely, variants in genes with “Limited” or “Disputed” validity require more cautious interpretation, even when foundation model pathogenicity scores are high.
29.1.4 Foundation Model Scoring
After quality, frequency, and consequence filters, foundation model predictions provide quantitative effect estimates for remaining candidates. For missense variants, AlphaMissense scores offer genome-wide pathogenicity estimates derived from protein structure and evolutionary conservation (Cheng et al. 2023). For splice-region variants, SpliceAI predictions quantify the probability and magnitude of splicing disruption (Jaganathan et al. 2019). For regulatory variants, Enformer and related models estimate effects on chromatin accessibility and gene expression in relevant tissues (Section 17.2; Section 17.2.3) (Avsec et al. 2021).
You have three candidate variants after filtering: (1) a missense variant with AlphaMissense score 0.95, (2) a splice-region variant with SpliceAI score 0.80, and (3) an intronic variant 50kb from any gene with low Enformer impact. All three are rare in gnomAD. Which would you prioritize for expert review, and why? What additional information would help you decide?
These scores do not directly translate to pathogenicity classifications. A high AlphaMissense score indicates that the protein change is likely functionally disruptive, not that it causes a specific disease. The clinical relevance of any functional disruption depends on the gene’s role in the patient’s phenotype, the inheritance pattern, and whether disruption of that gene produces the observed clinical features. Foundation model scores become one input to a structured evidence framework, not a standalone answer.
| Variant Type | Primary FM Tool | What It Predicts | Clinical Question Answered | Key Limitation |
|---|---|---|---|---|
| Missense | AlphaMissense | Protein functional disruption | Is amino acid change damaging? | Does not indicate disease specificity |
| Splice region | SpliceAI | Splice site creation/disruption | Will splicing be altered? | Cell-type-specific effects may vary |
| Regulatory | Enformer | Gene expression change | Will expression be affected? | Limited to trained tissues/cell types |
| Structural | AlphaFold | Protein structure change | Is protein fold disrupted? | Static structure, not dynamics |
You learned about population frequency filtering earlier (Section 29.1.2). Now that you understand foundation model scoring, consider: Why must frequency filtering happen before FM scoring rather than after? What would happen if you scored all 25,000 variants with AlphaMissense and then filtered by frequency?
Frequency filtering must happen first for computational efficiency and biological logic. Scoring all 25,000 variants would waste resources on common variants that cannot explain rare disease regardless of predicted effect. More importantly, high FM scores on common variants represent benign variation that happens to alter protein function; the population frequency evidence overrides the functional prediction. The prioritization funnel applies filters in order of both efficiency (removing the most variants earliest) and biological logic (ruling out impossibilities before evaluating functional impact).
29.2 ACMG-AMP Criteria and Computational Evidence
The American College of Medical Genetics and Genomics and Association for Molecular Pathology (ACMG-AMP) framework provides the dominant structure for clinical variant classification (Richards et al. 2015). Published in 2015 and subsequently refined through ClinGen expert panels, this framework assigns variants to five categories: pathogenic, likely pathogenic, variant of uncertain significance (VUS), likely benign, and benign. Classification emerges from combining multiple evidence types, each assigned a strength level (very strong, strong, moderate, supporting) and direction (pathogenic or benign).
29.2.1 Evidence Categories
ACMG-AMP evidence spans several domains. Population data includes allele frequency in controls (BA1, BS1, BS2 for benign; PM2 for pathogenic support when absent). Computational predictions include in silico tools predicting deleterious effects (PP3 for pathogenic support) or benign effects (BP4 for benign support). Functional data includes well-established functional assays demonstrating deleterious (PS3) or no (BS3) effect. Segregation data addresses co-segregation with disease in multiple affected family members (PP1) or lack of segregation (BS4). De novo status assigns strong (PS2) or moderate (PM6) evidence when parental samples are available and the variant is absent in both parents. Clinical information incorporates specific phenotype match (PP4) and prevalence considerations.
The framework combines these evidence types through defined rules. Pathogenic classification requires either one very strong criterion plus one strong, or two strong criteria, with additional supporting evidence. Likely pathogenic requires somewhat less evidence. Most variants end up as VUS because available evidence is insufficient for confident classification in either direction.
A missense variant is absent from gnomAD (PM2), has a high AlphaMissense score (PP3), and occurs in a gene associated with the patient’s phenotype (PP4). Under ACMG-AMP rules, what would be the maximum classification? Why cannot this evidence alone achieve “Pathogenic”?
Hint: Consider what evidence strengths these criteria provide and what combinations are required for pathogenic classification.
The maximum classification would be “Likely Pathogenic.” These criteria provide one moderate (PM2) and two supporting (PP3, PP4) levels of evidence. Pathogenic classification requires either one very strong plus one strong criterion, or two strong criteria; this combination falls short of that threshold. The computational evidence (PP3) provides only supporting-level weight unless upgraded by gene-specific calibration.
29.2.2 PP3 and BP4: Computational Evidence
Computational predictions enter the ACMG-AMP framework primarily through PP3 (pathogenic supporting evidence from computational predictions) and BP4 (benign supporting evidence). These criteria apply when multiple in silico tools agree that a variant is deleterious (PP3) or benign (BP4).
The original 2015 guidelines assigned these criteria only “supporting” strength, reflecting appropriate caution about computational predictions available at the time. Tools like SIFT, PolyPhen-2, and CADD had limited accuracy and concerning circularity issues (Section 4.5). The evaluation challenges these tools face, including benchmark contamination and label leakage, are examined in Section 13.5. ClinGen sequence variant interpretation working groups have subsequently refined how computational evidence is weighted, in some cases upgrading to moderate strength for well-calibrated predictors in specific genes.
The mapping from continuous model scores to discrete evidence strengths requires careful statistical calibration. This section involves likelihood ratios and odds of pathogenicity, concepts that many readers find initially counterintuitive. Take time with the relationship between prediction accuracy and evidence strength; it is foundational for clinical implementation.
Foundation models raise new questions about computational evidence strength. AlphaMissense achieves substantially higher accuracy than traditional tools on held-out ClinVar variants and deep mutational scanning data. Should predictions from these models receive greater evidentiary weight? The answer is not straightforward. Higher accuracy on aggregate benchmarks does not guarantee reliability for any individual prediction. Gene-specific calibration matters: a model may perform well across all genes but poorly for genes with unusual structure or function. And the fundamental limitation remains that computational predictions estimate functional impact, not clinical pathogenicity.
Responsible application of foundation model predictions in ACMG-AMP classification requires gene-specific and variant-type-specific calibration whenever possible, explicit acknowledgment that PP3/BP4 evidence is supporting unless upgraded by expert panel guidance, use of multiple orthogonal predictors rather than reliance on any single model, and clear documentation of which tools were applied and how predictions were interpreted.
29.2.3 Calibrating Predictions to Evidence Strength
Mapping continuous foundation model scores to discrete ACMG evidence strengths requires calibration against the odds ratios established by ClinGen (Tavtigian et al. 2018). The calibration framework, detailed in Section 18.5 and Section 24.4, defines thresholds where supporting evidence requires ~2:1 odds, moderate ~4:1, and strong ~18:1. For computational predictors to warrant evidence strength upgrades, their predictions must demonstrably achieve these odds ratios in validation datasets.
| Evidence Strength | Odds Ratio (Pathogenic) | Odds Ratio (Benign) | FM Threshold Example (AlphaMissense) |
|---|---|---|---|
| Supporting | ~2:1 | ~1:2 | Score 0.5-0.8 |
| Moderate | ~4:1 | ~1:4 | Score 0.8-0.9 |
| Strong | ~18:1 | ~1:18 | Score >0.9 (gene-specific validation needed) |
| Very Strong | ~350:1 | ~1:350 | Not achieved by current predictors alone |
For AlphaMissense and similar foundation models, published validation shows that the highest-scoring variants (above 0.9) achieve odds ratios exceeding the strong evidence threshold in some gene contexts (Pejaver et al. 2022; Bergquist et al. 2025). ClinGen expert panels have begun incorporating these calibrations for specific genes, allowing upgraded evidence strength when predictions meet defined criteria. Clinicians should follow gene-specific expert panel recommendations when available rather than applying uniform thresholds across all genes.
A missense variant has an AlphaMissense score of 0.92. The gene-specific ClinGen recommendation allows upgrading PP3 to moderate strength for scores above 0.85 in this gene. The variant is also absent from gnomAD (PM2). Can this variant be classified as pathogenic? What additional evidence would be needed?
No, this cannot yet be classified as pathogenic. The evidence totals one moderate (PM2 for absence in population databases) plus one moderate (upgraded PP3). Pathogenic classification requires either one very strong plus one strong, or two strong criteria. This combination reaches only “Likely Pathogenic” at best. Additional evidence needed might include: de novo status (PS2, strong), strong functional data (PS3), or co-segregation with disease in multiple families (PP1 upgradable to strong).
29.3 Family-Based Analysis
Rare disease interpretation rarely relies on proband sequence alone. Family structure provides substantial additional information through inheritance pattern constraints, de novo status determination, and segregation analysis.
A single variant in a proband might be classified as VUS due to insufficient evidence. That same variant, when shown to have arisen de novo in a child with severe early-onset disease, gains strong pathogenic evidence (PS2). Family data transforms interpretation not by changing the variant itself but by providing context that constrains biological possibility. Always consider what family samples, if obtained, might resolve uncertain classifications.
29.3.1 De Novo Variants
De novo variants arise newly in the proband and are absent in both parents. For severe, early-onset dominant conditions, de novo mutations are expected: affected individuals rarely reproduce, so the disease-causing allele must arise fresh each generation. Observing a damaging variant as de novo provides strong evidence for pathogenicity under ACMG-AMP (PS2), often sufficient to push a candidate toward likely pathogenic or pathogenic classification.
The informativeness of de novo status depends on the mutation rate at that site and the expected de novo rate for the variant class. The human germline mutation rate is approximately 1 to 1.5 new mutations per 100 million base pairs per generation (Kong et al. 2012). For protein-coding exons (approximately 30 million base pairs), each individual carries roughly one new coding variant on average. Finding a damaging de novo variant in a candidate gene is therefore much more suspicious than finding an inherited variant of similar predicted effect.
Foundation models assist de novo interpretation by providing effect estimates that help prioritize among multiple de novo variants (typical trio sequencing identifies one to three de novo coding variants) and by identifying de novo variants in noncoding regions that might disrupt critical regulatory elements. A de novo variant in a brain-specific enhancer upstream of a known epilepsy gene, predicted by Enformer to substantially reduce gene expression (Section 17.2.1), warrants investigation even though traditional pipelines might overlook noncoding de novo events.
29.3.2 Compound Heterozygosity and Phasing
Recessive diseases require biallelic disruption: both copies of the gene must be affected for disease to manifest. When a proband carries two different heterozygous variants in the same gene, the critical question is whether these variants are in trans (on opposite chromosomes, leading to biallelic disruption) or in cis (on the same chromosome, leaving one copy functional). Think of it like having two copies of a critical instruction manual: if each copy has a different page torn out, you can piece together the complete instructions from the two damaged copies. But if both torn pages are from the same copy, you still have one perfect backup, and one copy missing two pages that you cannot use anyway.
A child has a rare recessive metabolic disorder. You identify two heterozygous missense variants in the candidate gene. The mother carries both variants. What does this tell you about whether these variants can explain the disease? What additional information would you need?
If the mother carries both variants and is unaffected, the variants are most likely in cis (on the same chromosome). The child inherited this chromosome from the mother, meaning one of the child’s gene copies has both variants, but the other copy (from the father) is likely normal. This configuration cannot explain a recessive disorder because one functional copy remains. You would need to: (1) confirm the father’s genotype at these positions, (2) check for a third variant the child might have inherited from the father, or (3) reconsider whether this gene explains the phenotype.
Phasing determines which configuration applies (Section 1.4.1 for clinical stakes; Section 1.4.3 for methodological details). Several approaches are available. Physical phasing through long-read sequencing directly observes which variants occur on the same DNA molecule, providing definitive phase information when reads span both variant positions. Trio phasing infers phase from parental genotypes: if one variant is inherited from the mother and one from the father, they must be in trans. Statistical phasing uses population haplotype patterns to estimate phase, though accuracy decreases for rare variants not well-represented in reference panels.
For clinical interpretation, trio phasing is often the most practical approach. If both variants are confirmed in trans and both are predicted damaging, this supports pathogenicity under a recessive model. If both variants were inherited from a single parent (in cis), the gene cannot explain a recessive phenotype unless a third variant exists.
Foundation models contribute by estimating the functional severity of each variant. A missense variant with marginal AlphaMissense score might not warrant attention alone, but paired in trans with a clear loss-of-function variant, the compound heterozygous combination could produce sufficient functional disruption to cause disease.
29.3.3 Segregation Analysis
In larger families with multiple affected and unaffected individuals, segregation analysis examines whether candidate variants track with disease status. Under a dominant model, all affected individuals should carry the variant, and penetrance assumptions constrain how many unaffected carriers are expected. Under a recessive model, affected individuals should be homozygous or compound heterozygous, carriers should be heterozygous, and unaffected non-carriers should lack the variant entirely.
Strong segregation evidence (PP1, upgradable to strong evidence with sufficient meioses) can substantially support pathogenicity classification. Equally important, failure to segregate provides benign evidence (BS4): a variant present in unaffected family members at rates inconsistent with the proposed inheritance model is unlikely to be causal.
Segregation analysis requires accurate pedigree information, confirmed sample identities, and careful consideration of age-dependent penetrance and phenocopies. A variant might be present in an unaffected young relative who will develop disease later, or an affected relative might have a different etiology (phenocopy). These complexities require clinical judgment that no computational model can replace.
29.4 Somatic Variant Interpretation in Cancer
Cancer genomics presents distinct interpretive challenges. Tumor genomes accumulate mutations throughout malignant evolution, creating a mix of driver mutations (those conferring selective advantage and contributing to cancer development) and passenger mutations (bystanders with no functional consequence). The interpretive task shifts from identifying variants causing inherited disease to identifying variants driving tumor biology and predicting therapeutic response.
29.4.1 Germline versus Somatic Distinction
Cancer sequencing must distinguish germline variants (present in all cells, inherited or de novo) from somatic variants (acquired in the tumor lineage). Tumor-only sequencing cannot make this distinction reliably, as rare germline variants may be mistaken for somatic events. Paired tumor-normal sequencing, comparing tumor to a non-malignant sample from the same patient, enables confident somatic variant identification.
This distinction has direct clinical implications. A germline pathogenic variant in BRCA1 indicates hereditary cancer predisposition affecting the patient and potentially their family members, warranting genetic counseling and possibly risk-reducing interventions. A somatic BRCA1 mutation arose in the tumor and has no implications for inherited risk, though it may still predict response to PARP inhibitors.
A 45-year-old woman with breast cancer has tumor sequencing that identifies a BRCA1 frameshift mutation. What additional test would you recommend, and why does it matter for her family members?
You would recommend germline testing on a blood sample to determine whether the BRCA1 mutation is germline (inherited) or somatic (tumor-only). If germline, it indicates hereditary cancer predisposition affecting her family members and warrants genetic counseling and cascade testing. If somatic, it arose only in the tumor and has no implications for inherited risk in relatives.
29.4.2 Driver Classification
Among somatic mutations, identifying drivers requires different evidence than germline pathogenicity assessment. Recurrence across independent tumors suggests selective advantage: if BRAF V600E appears in 50% of melanomas, this frequency far exceeds what chance would predict, implying functional importance. The logic here is fundamentally evolutionary: tumors arise through clonal expansion, where cells with growth advantages outcompete their neighbors. A mutation that appears independently in thousands of tumors must confer such an advantage, because random chance alone would distribute mutations nearly uniformly across the ~20,000 protein-coding genes. The probability of the same specific mutation arising repeatedly by chance is vanishingly small; recurrence therefore provides strong statistical evidence of positive selection during tumor evolution. Databases like COSMIC catalog somatic mutation frequencies across cancer types, enabling recurrence-based prioritization (Tate et al. 2019).
Functional impact predictions from foundation models apply somewhat differently in the somatic context. A missense variant predicted highly damaging by AlphaMissense in a tumor suppressor gene suggests loss of function consistent with a driver role. The same prediction in an oncogene might indicate loss of normal regulation, potentially activating rather than inactivating the protein. Interpretation must consider the gene’s role (oncogene versus tumor suppressor) and the specific functional consequence of the variant.
| Aspect | Germline Interpretation | Somatic Interpretation |
|---|---|---|
| Primary question | Causes inherited disease? | Drives tumor? Predicts therapy response? |
| Framework | ACMG-AMP classification | Recurrence, functional impact, biomarkers |
| Population frequency use | Common = likely benign | Not directly applicable |
| Family implications | Affects relatives | Tumor-specific, no inheritance |
| FM role | Functional impact on protein | Driver vs. passenger; therapeutic target |
| Clinical action | Genetic counseling, surveillance | Treatment selection, prognosis |
Tumor mutational burden provides context for individual variant interpretation. Hypermutated tumors (from mismatch repair deficiency or POLE mutations) may carry thousands of coding mutations, making it difficult to identify drivers against this noisy background. In such cases, restricting attention to known hotspots, truncating mutations in tumor suppressors, and variants with strong functional predictions helps prioritize the likely relevant events.
29.4.3 Therapeutic Biomarkers
Somatic variant interpretation increasingly focuses on therapeutic implications. Specific variants predict response to targeted therapies: EGFR exon 19 deletions and L858R mutations predict erlotinib response in lung cancer; BRAF V600E predicts vemurafenib response in melanoma; PIK3CA mutations indicate alpelisib benefit in breast cancer (Lynch et al. 2004; Chapman et al. 2011; André et al. 2019). These associations derive from clinical trials demonstrating differential response by mutation status.
Foundation models do not directly predict therapeutic response, as they lack the clinical outcome data that would be required. Their contribution is in characterizing novel variants in known therapeutic target genes. A patient whose tumor carries an unusual EGFR mutation not previously characterized might be evaluated using structural models and effect predictions to estimate whether the mutation likely preserves the drug-binding site and confers similar dependency as canonical sensitizing mutations. Such analyses are hypothesis-generating rather than definitive but can inform clinical decision-making when direct trial evidence is unavailable.
For a novel variant in a known drug target gene, foundation models can help assess:
- Structural impact: Does AlphaFold predict the variant alters the drug-binding pocket?
- Functional consequence: Does AlphaMissense predict the variant disrupts protein function?
- Similarity to known variants: Is the variant in the same domain as established sensitizing or resistance mutations?
These assessments support but do not replace clinical judgment. Document the reasoning and discuss uncertainty with the patient.
29.5 Laboratory Validation
Computational predictions, however accurate, remain predictions. Functional assays provide direct experimental evidence of variant effects, and ACMG-AMP appropriately weights functional data (PS3 for damaging functional effect, BS3 for no functional effect) as strong evidence when assays are well-validated.
29.5.1 Types of Functional Assays
Different variant types require different assay approaches. For missense variants, protein function assays measure specific biochemical activities of the mutant protein: enzyme activity, DNA binding, protein-protein interactions, or cellular phenotypes in model systems. Deep mutational scanning systematically characterizes all possible amino acid substitutions at each position in a protein, creating comprehensive functional maps (Section 2.4.4 for data resources; Section 18.4.1 for how foundation models leverage this data). These maps enable immediate lookup of functional effects for any observed missense variant, though coverage remains incomplete across the proteome.
For splicing variants, minigene assays clone genomic regions containing the variant into expression vectors and measure splicing patterns in cultured cells. RNA sequencing from patient tissue (when accessible) directly observes whether aberrant splicing occurs in vivo. SpliceAI predictions can be validated by these direct measurements, establishing whether computational predictions match experimental reality for specific variants.
For regulatory variants, reporter assays measure whether variant-containing regulatory elements drive appropriate expression patterns. Massively parallel reporter assays (MPRAs) enable testing thousands of variants simultaneously, generating the training data that informs foundation model development while also providing direct validation for specific variants of clinical interest. CRISPR-based approaches can introduce variants into endogenous genomic contexts rather than artificial reporter constructs, providing more physiologically relevant readouts.
29.5.2 Integrating Functional Evidence
Functional data enters ACMG-AMP classification through PS3 (strong pathogenic evidence from functional studies showing deleterious effect) and BS3 (strong benign evidence from functional studies showing no effect) (Brnich et al. 2019). The strength assignment depends on assay validation: well-established assays measuring physiologically relevant endpoints warrant strong evidence, while novel or less-validated assays may warrant only moderate or supporting strength.
ClinGen has developed detailed recommendations for functional evidence evaluation. The specific gene and disease mechanism should guide assay selection. Controls (known pathogenic and known benign variants) should be included to validate assay performance. The biological relevance of the assay endpoint to the disease mechanism must be justified. These requirements reflect appropriate caution: not all functional assays are equally informative, and inappropriate assays can mislead classification.
Foundation model predictions can prioritize which variants most warrant functional follow-up. When resources limit testing to a subset of VUS, selecting those with discordant computational predictions (high predicted impact but uncertain clinical classification) maximizes the information gained. Variants where functional testing might resolve classification provide greater value than variants where classification is already clear or unlikely to change regardless of functional results.
Your laboratory has resources to functionally test 20 VUS per month. You have 200 VUS in a cardiac arrhythmia gene. How would you prioritize which variants to test? What role might foundation model predictions play in this prioritization?
29.5.3 Closing the VUS Loop
The accumulation of variants of uncertain significance represents a major challenge in clinical genetics. Patients receive results that cannot be interpreted, creating anxiety and uncertainty. As more individuals undergo sequencing, VUS prevalence grows. Systematic efforts to resolve VUS through functional characterization could dramatically improve the clinical utility of genetic testing.
High-throughput functional approaches offer a path forward. Saturation genome editing applies CRISPR to introduce every possible single-nucleotide variant at clinically important loci, then measures functional consequences through cellular phenotypes or growth selection (Findlay et al. 2018). These experiments generate comprehensive functional maps that can immediately classify any observed variant. The Brotman Baty Institute’s ongoing efforts for BRCA1, mismatch repair genes, and other clinically important loci exemplify this approach.
Foundation models trained on these functional datasets can generalize beyond directly measured variants, predicting effects for positions or genes not yet characterized experimentally. This creates a productive cycle: functional data improves model training, improved models identify high-priority variants for follow-up, and targeted experiments fill gaps while further improving models.
29.6 Practical Workflow Integration
Translating foundation model capabilities into clinical practice requires integration with existing laboratory and clinical workflows. The technical and interpretive steps must fit within established regulatory frameworks, electronic health record systems, and clinical team structures.
29.6.1 Laboratory Workflow
Clinical sequencing laboratories operate under regulatory oversight (CLIA certification, state licensure, and potentially CAP accreditation in the United States). Validated pipelines must produce consistent, reproducible results. Introducing new computational tools requires formal validation demonstrating that the tool performs as expected on representative sample types, that outputs are interpretable and actionable by clinical staff, and that results are documented and traceable.
For foundation model integration, validation studies should assess performance on variants with known clinical classifications, compare predictions to existing tools to understand concordance and discordance, evaluate performance across variant types (missense, splice, regulatory) and gene categories, and document threshold selection and evidence strength assignment.
Laboratory information management systems must capture foundation model predictions alongside other variant annotations. Reports to clinicians should clearly indicate the role of computational evidence, the specific tools applied, and the evidence strength assigned. Overreliance on computational predictions, or failure to communicate their limitations, risks inappropriate clinical decisions.
Clinical reports should include:
- Tool identification: Which foundation model(s) were used (name, version)
- Score and threshold: Raw prediction score and classification threshold applied
- Evidence strength: How the prediction maps to ACMG evidence (PP3/BP4 and strength level)
- Limitations: Standard disclaimer about computational evidence
- Discordance: Note if different tools disagree substantially
Example statement: “AlphaMissense v1.0 pathogenicity score: 0.92 (threshold for PP3 moderate: 0.8). This computational evidence is classified as moderate supporting evidence for pathogenicity per ClinGen recommendations for this gene.”
29.6.2 Clinical Decision-Making
Variant interpretation reports ultimately inform clinical decisions: whether to pursue additional testing, what genetic counseling to provide, whether to adjust medical management, and what surveillance or prevention strategies to recommend. These decisions rest with clinicians and genetic counselors working with patients, not with computational algorithms.
Foundation model predictions support this process by improving the efficiency and accuracy of variant prioritization, reducing the number of VUS through more informative computational evidence, identifying potentially actionable variants in previously overlooked genomic regions, and enabling rapid assessment of novel variants not previously observed.
The interpretive report should convey both what computational predictions indicate and the uncertainty that remains. Clinicians must understand that even highly accurate models make errors, that predictions may be less reliable for underrepresented populations or unusual variant types, and that computational evidence is one component of a comprehensive assessment. Shared decision-making with patients should acknowledge these limitations while conveying the best current understanding.
29.6.3 Regulatory and Ethical Considerations
From Section 27.1.1, recall the SaMD (Software as Medical Device) risk classification framework based on condition seriousness and decision role. Where would a foundation model-based rare disease diagnostic tool fall in this framework? What level of regulatory evidence would be required?
Clinical use of foundation model predictions raises regulatory questions addressed more fully in Chapter 27. In the United States, laboratory-developed tests using computational predictions fall under CLIA oversight, with additional FDA jurisdiction increasingly asserted for software as a medical device. European regulations under IVDR impose their own requirements. Laboratories must navigate this evolving landscape while ensuring that clinical utility keeps pace with regulatory compliance.
Foundation models trained predominantly on European ancestry data may systematically provide less accurate predictions for other populations. This means a patient of African ancestry might receive a VUS classification where a European ancestry patient with the same variant receives a clear pathogenic or benign classification. The computational “evidence gap” can widen health disparities. Laboratories should track ancestry-specific performance metrics and communicate uncertainty appropriately across populations.
Equity concerns deserve particular attention. Foundation models trained predominantly on data from individuals of European ancestry may perform less well for other populations (Section 13.2.1 for detailed discussion of ancestry-related performance disparities; Section 27.5.2 for equity implications). If computational predictions systematically provide less informative evidence for underrepresented groups, this could widen existing disparities in diagnostic yield and clinical care. Ongoing efforts to diversify training data and evaluate performance across ancestries are essential for equitable clinical deployment.
29.7 Interpretive Partnership
Foundation models transform variant interpretation by providing more accurate, comprehensive, and fine-grained predictions than previous computational approaches. Missense pathogenicity can be estimated proteome-wide with substantially improved accuracy. Regulatory variant effects can be predicted across tissues and cell types. Splicing disruption can be quantified with clinical-grade precision. These capabilities accelerate the diagnostic odyssey, enabling faster and more confident resolution for patients who have often waited years for answers.
Yet foundation models do not replace human judgment in clinical genetics. They do not understand phenotypes, family structures, or therapeutic implications. They do not weigh the psychological impact of uncertain results or navigate the ethical complexities of predictive testing in unaffected relatives. They provide evidence that must be integrated within clinical frameworks designed around human decision-making, alongside family history, physical examination, prior testing, and the accumulated wisdom of clinical experience.
The productive framing positions foundation models as partners in interpretation: computational systems that handle pattern recognition at scales beyond human capacity, freeing clinical experts to focus on integration, communication, and the decisions where human judgment remains essential. This partnership model, rather than replacement or autonomy, defines the path forward for genomic foundation models in rare disease diagnosis.
Before reviewing the summary, test your recall:
- Describe the variant prioritization funnel from ~25,000 variants to a final diagnosis. At which stage do foundation models contribute most effectively?
- How do computational predictions map to ACMG-AMP evidence categories? What prevents AlphaMissense scores from providing strong evidence without additional validation?
- Why does trio sequencing dramatically improve diagnostic yield compared to singleton sequencing? Give two specific examples of evidence that trios enable.
- How does somatic variant interpretation differ from germline variant interpretation? What different questions does each framework answer?
Variant Prioritization Funnel: The funnel applies progressive filters: quality/technical filters remove sequencing artifacts, population frequency filters remove common variants unlikely to cause rare disease, consequence filters prioritize coding and functional variants, and foundation model scoring ranks remaining candidates by predicted effect. Foundation models contribute most effectively at the scoring stage (~50 candidates remaining), after basic filters eliminate obvious non-candidates but before expensive expert curation. This positioning maximizes efficiency while minimizing the risk of discarding true causal variants.
ACMG-AMP Computational Evidence: Computational predictions enter through PP3 (pathogenic support) and BP4 (benign support) criteria, traditionally assigned only “supporting” strength. AlphaMissense scores provide functional impact predictions but cannot achieve strong evidence without gene-specific calibration because: (1) functional disruption does not equal clinical pathogenicity, (2) accuracy varies across genes and variant types, and (3) strong evidence requires odds ratios of ~18:1, which must be empirically demonstrated through validation against clinical classifications in specific gene contexts.
Trio Sequencing Power: Trio sequencing (proband plus both parents) dramatically improves diagnostic yield by enabling de novo variant identification and accurate phasing. First, de novo variants observed in the child but absent in both parents receive strong pathogenic evidence (ACMG PS2), particularly valuable for severe early-onset dominant conditions where affected individuals rarely reproduce. Second, trios enable direct phasing of compound heterozygous variants, determining whether two variants in the same gene are in trans (disrupting both copies, consistent with recessive disease) or in cis (one functional copy remains, excluding recessive causation).
Germline vs. Somatic Interpretation: Germline interpretation asks “Does this variant cause inherited disease?” and applies ACMG-AMP criteria incorporating population frequency, family segregation, and inheritance patterns, with implications for genetic counseling and family testing. Somatic interpretation asks “Is this variant a tumor driver? Does it predict therapy response?” and relies on recurrence across tumors, functional impact in the context of oncogenes vs. tumor suppressors, and associations with targeted therapies. The same variant (e.g., BRCA1 mutation) has completely different implications depending on whether it is germline (hereditary cancer risk affecting family) or somatic (tumor-specific, potential therapy target, no family implications).
This chapter examined how foundation models integrate into clinical variant interpretation for rare disease diagnosis.
Key Takeaways:
The Prioritization Funnel: Clinical interpretation progressively filters ~25,000 variants to ~5-10 candidates through quality, frequency, consequence, and FM-based scoring stages. Foundation models contribute most at the scoring stage, after basic filters remove obvious non-candidates.
ACMG-AMP Integration: Computational predictions enter the framework through PP3/BP4 criteria, traditionally at supporting strength. Well-calibrated foundation models may warrant upgraded evidence strength for specific genes, but this requires formal validation and gene-specific calibration.
Family Analysis Power: Trio sequencing, phasing, and segregation analysis provide orthogonal evidence that dramatically enhances interpretation. De novo status (PS2) provides strong evidence; compound heterozygosity determination requires accurate phasing.
Germline vs. Somatic Context: The same variant interpretation framework does not apply to both contexts. Germline interpretation asks “Does this cause inherited disease?”; somatic interpretation asks “Is this a driver? Does it predict therapy response?”
The VUS Challenge: Most variants remain VUS due to insufficient evidence. High-throughput functional assays and improved computational predictions work together to close this gap.
Human-AI Partnership: Foundation models accelerate prioritization and provide quantitative evidence, but clinical judgment remains essential for final classification, communication, and ethical decision-making.
Connections to Other Chapters:
- Variant effect prediction models: Chapter 18
- Calibration and uncertainty: Section 24.3
- Regulatory and ethical frameworks: Chapter 27
- Ancestry and fairness considerations: Section 13.2.1