17 Interpretability & Mechanisms

Warning

TODO:

17.1 Why Interpretability Matters for Genomic Models

Deep learning models in genomics increasingly operate as systems-level surrogates for biology: they predict chromatin features, gene expression, or variant effects directly from sequence. When such models drive mechanistic hypotheses or clinical decisions, how they make predictions becomes as important as how well they perform.

Interpretability in this context serves at least four roles:

Mechanistic insight
- Extract sequence motifs (putative TF binding sites), regulatory grammars, and long-range interaction patterns directly from trained models.
- Turn “black-box” predictions into candidate mechanisms that can be tested experimentally.
Model debugging and confounder detection
- Reveal when models rely on artifacts (e.g., GC content, mappability, batch-specific motifs) instead of bona fide regulatory signals.
- Complement Chapter 16’s focus on data and evaluation confounders by interrogating model internals.
Clinical and translational trust
- Support variant interpretation workflows by explaining why specific rare or de novo variants are predicted to be damaging.
- Provide interpretable axes of variation (e.g., motif disruptions, regulatory “sequence classes”) that can be combined with orthogonal evidence.
Scientific communication
- Condense high-dimensional latent representations into human-readable abstractions—motifs, regulatory classes, or interaction graphs—that can be shared across labs and applications.

This chapter surveys the main interpretability tools developed for genomic models, from convolutional filters and saliency maps to global regulatory vocabularies and attention patterns in genomic language models (gLMs) and transformer-based regulatory models. Throughout, the emphasis is on mechanistic interpretability: moving from “what correlates with the prediction?” to “what regulatory hypothesis does the model imply?”

17.2 Interpreting Convolutional Filters as Motifs

Convolutional neural networks (CNNs) remain a workhorse for modeling cis-regulatory sequence (Chapters 5–7). In many of these models, first-layer convolutional filters act as motif detectors:

A filter slides along the one-hot encoded sequence (Chapter 8).
At each position, it computes a dot product between its weights and the local sequence window.
High activation indicates that the subsequence closely matches the filter’s preferred pattern.

17.2.1 From Filters to Motif Logos

A common workflow to interpret filters:

Collect high-activation instances
- Run the trained model on a large sequence set (e.g., training data or genome tiles).
- For each filter, record positions where its activation exceeds a threshold.
Extract and align subsequences
- Pull out fixed-length windows around those positions.
- Align them and compute base frequencies at each position.
Build a position weight matrix (PWM)
- Convert base frequencies to log-odds scores relative to a background distribution.
- Visualize as a sequence logo.
Match to known motif databases
- Compare PWMs to JASPAR or HOCOMOCO TF motif libraries using similarity scores.
- Annotate filters with candidate TF identities (“this filter resembles CTCF”).

This procedure has been applied to models like DeepSEA and its successors to demonstrate that early layers learn motifs for canonical TFs and chromatin-associated patterns, validating that models are discovering biologically meaningful sequence features rather than arbitrary patterns.

17.3 Attribution Methods: Connecting Bases to Predictions

Attribution methods assign an “importance score” to each input base (or k-mer), reflecting how much it contributes to a prediction for a specific task and sequence.

Let $ f(x) $ be a model predicting some output (e.g., chromatin accessibility, gene expression, or variant effect) from sequence $ x $. Attribution methods estimate the contribution of each base $ x_i $ to $ f(x) $, often for a specific output neuron (e.g., a particular cell type).

17.3.1 In Silico Mutagenesis (ISM)

In silico mutagenesis is conceptually straightforward and model-agnostic:

For each position $ i $ and base $ b $, create a mutated sequence $ x^{(i b)} $.
Compute the change in prediction
\[ \Delta f_{i,b} = f(x^{(i \rightarrow b)}) - f(x). \]
Aggregate these changes (e.g., max across non-reference alleles) to obtain a per-base importance score.

Variants:

Single-nucleotide ISM: Mutate each base individually; expensive but faithful.
Saturation mutagenesis: Explore all possible oligos in a window to probe combinatorial effects and grammar.
Variant-specific scoring: Evaluate $ f() - f() $ for a particular SNV or indel.

Strengths:

True “what-if” causal perturbations under the model.
Works for any differentiable or non-differentiable model (including ensembles and post-processed scores).

Limitations:

Computationally expensive: $ O(L ||) $ forward passes for sequence length $ L $ and alphabet size $ || $.
Captures local effects; may miss distributed interactions if not designed carefully.

17.3.2 Gradient-Based Methods

Gradient-based methods approximate “how much would the prediction change if we nudged this base?” via backpropagation.

17.3.2.1 Vanilla Gradient / Saliency

Compute the gradient of the output with respect to the input:

\[ s_i = \frac{\partial f(x)}{\partial x_i}. \]

With one-hot encoding, this gradient can be interpreted as the sensitivity to changing the nucleotide at position $ i $. A common variant multiplies the gradient by the input (“gradient × input”).

Pros:

Requires a single backward pass per sequence.
Easy to implement and integrate into training workflows.

Cons:

Susceptible to gradient saturation (zero gradients in regions where the model is already confident).
Noisy saliency maps often require smoothing or aggregation across multiple noisy inputs.

17.3.2.2 DeepLIFT

DeepLIFT (Deep Learning Important FeaTures) compares neuron activations between an input and a reference (or baseline) sequence, distributing differences back to inputs using layer-wise rules rather than raw gradients. It aims to:

Avoid gradient saturation.
Enforce a consistency constraint: the sum of input contributions matches the difference in output between input and reference.

DeepLIFT has been widely used for genomic models, particularly in conjunction with TF-MoDISco (next section), where its base-level importance scores serve as inputs for motif discovery.

17.3.2.3 Integrated Gradients (IG)

Integrated Gradients compute the path integral of gradients along a linear interpolation from a reference $ x’ $ to the input $ x $:

\[ \text{IG}_i(x) = (x_i - x'_i) \int_{\alpha=0}^1 \frac{\partial f\left(x' + \alpha(x - x')\right)}{\partial x_i} d\alpha. \]

In practice, this integral is approximated via a Riemann sum over discrete steps. IG satisfies desirable axioms (e.g., sensitivity, implementation invariance) and tends to be less noisy than raw gradients.

Key design considerations for all gradient-based methods:

Choice of reference:
- Random genomic background, dinucleotide-shuffled sequence, or an “average” non-functional sequence.
- Different references emphasize different aspects of the signal.
Output selection:
- Single-task models: directly attribute the scalar output.
- Multi-task models: choose a specific track (e.g., H3K27ac in one cell type) or aggregate across tasks.
Post-processing:
- Smooth along the sequence (e.g., average in sliding windows).
- Aggregate over channels or strands.

17.4 From Attributions to Motifs: TF-MoDISco

Attribution maps highlight where the model focuses, but they do not automatically yield consistent motifs or regulatory grammars. TF-MoDISco (Transcription Factor Motif Discovery from Importance Scores) was developed to bridge this gap.

17.4.1 Core Idea

Rather than performing motif discovery on raw sequences, TF-MoDISco operates on base-level importance scores:

Compute importance scores
- Use DeepLIFT, ISM, IG, or similar methods on many sequences.
- Obtain an importance score for each base and strand.
Extract “seqlets”
- Identify local windows where the total importance exceeds a threshold.
- Treat each window (seqlet) as a candidate motif instance.
Cluster seqlets
- Compare seqlets using similarity metrics that consider both sequence and importance scores.
- Cluster into groups corresponding to putative motifs.
Build consolidated motifs
- Align seqlets within each cluster.
- Construct PWMs and importance-weighted logos.
- Optionally match to known TF motifs.
Report motif instances and grammar
- Map motifs back onto the genome.
- Analyze co-occurrence, spacing, and orientation rules.

When applied to models like BPNet, TF-MoDISco has recovered known TF motifs, discovered novel variants, and revealed grammars (e.g., directional spacing constraints) that can be validated with synthetic reporter assays.

In the context of genomic foundation models, an analogous workflow can be applied:

Use a GFM or transformer-based model to produce base-level attributions for a specific downstream task (e.g., chromatin accessibility).
Run TF-MoDISco to extract a task-specific motif vocabulary.
Analyze how motif usage changes across cell types, conditions, or species.

17.5 Interpreting Attention and Long-Range Context

Transformer-based models (Chapters 8–11) use self-attention to mix information across long genomic contexts, enabling them to capture distal regulatory interactions and genomic organization. Interpretability here often centers on attention patterns and long-range attribution.

17.5.1 Genomic Language Models and Operon Structure (gLM)

Genomic language models (gLMs) treat genes or genomic tokens as a sequence and train transformers to predict masked tokens, analogous to protein or text LMs. Work on gLMs trained on millions of metagenomic scaffolds shows that these models learn non-trivial genomic structure:

Attention heads mark operons and co-regulated modules
- Certain heads specialize in connecting genes that are part of the same operon or functional module.
- Attention maps reveal networks of co-regulated genes, often aligning with known operon boundaries.
Functional semantics and taxonomic signals
- Latent representations cluster by enzymatic function and gene ontology.
- Attention patterns can separate clades and capture clade-specific gene neighborhoods.
Mechanistic interpretation
- These patterns suggest the model has inferred a “syntax” of gene neighborhoods: which genes tend to co-occur and in what order, conditioned on phylogenetic context.

While attention is not universally a faithful explanation of model decisions, attention analysis in gLM reveals emergent mechanistic structure that is consistent with biological organization.

17.5.2 Distal Regulatory Elements in Enformer-Like Models

Enformer and related models predict chromatin features and gene expression from large genomic windows (e.g., 100 kb+) by combining convolutional layers with transformer blocks.

Key interpretability questions:

Which distal enhancers drive the predicted expression at a given transcription start site (TSS)?
How do variants in distal elements propagate to gene-level outputs?

Interpretability strategies include:

Gradient-based attributions over long windows
- Compute attributions of a gene’s expression output with respect to input bases across the entire window.
- Visualize importance tracks to highlight putative enhancers and silencers.
Attention pattern analysis
- Identify attention heads that consistently link distal positions to TSS regions.
- Relate high-attention edges to Hi-C contact maps or chromatin interaction data.
In silico perturbation of regulatory elements
- Delete or scramble candidate enhancers and recompute gene expression predictions.
- Insert synthetic motifs or enhance motif scores to gauge dose–response relationships.

These analyses can reveal candidate enhancer–promoter links and TF motifs that the model deems critical for gene regulation, helping translate raw attention weights and attributions into mechanistic hypotheses.

17.6 Global Regulatory Vocabularies: Sei Sequence Classes

Most motif-based interpretation operates at the local level. Sei takes a complementary global approach by learning a vocabulary of regulatory sequence classes that summarize a vast array of chromatin profiles.

17.6.1 The Sei Framework

Sei trains a deep sequence model to predict tens of thousands of chromatin profiles (TF binding, histone marks, accessibility) across many cell types directly from DNA sequence. The key interpretability step is to compress these thousands of outputs into a few dozen “sequence classes”, each representing a characteristic regulatory activity pattern:

Promoter-like classes (e.g., H3K4me3-rich, TSS-proximal).
Enhancer-like classes (H3K27ac, H3K4me1).
Repressive classes (H3K27me3, H3K9me3).
Cell-type- or lineage-specific modules (e.g., neuronal, immune).

Each input sequence (or variant) is assigned a score for each sequence class, effectively mapping it to a point in a low-dimensional “regulatory activity space”.

17.6.2 Interpretation and Applications

A regulatory vocabulary like Sei’s supports several interpretability goals:

Intermediate, human-interpretable features
- Instead of raw high-dimensional outputs, one can reason in terms of “promoter-like,” “B-cell enhancer,” or “polycomb-repressed” scores.
Variant interpretation
- Variants can be summarized by their shifts in sequence-class scores, yielding concise descriptions like “increases neuronal enhancer activity while decreasing repressive marks.”
Trait and disease enrichment
- GWAS loci can be enriched for specific sequence classes, revealing tissues and regulatory programs most relevant to disease.

This notion of a regulatory vocabulary parallels word embeddings or topics in NLP and provides a bridge between highly multivariate model outputs and mechanistically interpretable axes of variation.

17.7 Case Study: From Base-Pair Attributions to Regulatory Grammar

Putting the pieces together, a typical mechanistic interpretability pipeline for a CNN or transformer-based regulatory model might look like:

Train a predictive model
- For example, predict chromatin accessibility or TF ChIP-seq tracks from sequence.
Compute base-level attributions
- Use DeepLIFT or IG for positive predictions in a target cell type.
Discover motifs with TF-MoDISco
- Extract seqlets from high-attribution regions, cluster, and derive motifs.
- Match motifs to known TFs and identify novel ones.
Infer grammar from motif instances
- Analyze motif co-occurrence, spacing, and orientation in high-scoring sequences.
- Use knock-in/knock-out in silico experiments to confirm dependencies (e.g., both motifs needed, order matters).
Relate motifs to sequence classes or attention patterns
- Map motif-rich regions to Sei sequence classes or Enformer attributions.
- Connect local motif grammar to global regulatory context (e.g., distal enhancer–promoter linkages, cell-type specificity).
Validate with experiments or external datasets
- Check whether motif disruptions align with reporter assay effects or allelic imbalance.
- Compare inferred enhancer–promoter links to Hi-C or CRISPR perturbation screens.

This integrated approach moves beyond “pretty saliency maps” toward testable hypotheses about regulatory logic.

17.8 Evaluating Interpretations: Faithfulness vs Plausibility

Not all explanations are equally trustworthy. Effective interpretability work must grapple with the distinction between:

Plausibility: Does the explanation “look” biological (e.g., known motifs, enhancer marks)?
Faithfulness: Does the explanation accurately reflect the internal computation of the model?

Potential pitfalls:

Attention as explanation
- High attention weights need not correspond to large changes in output; they may reflect information routing rather than causal influence.
- Combining attention with attribution or perturbation analyses yields more reliable insights.
Attribution noise and saturation
- Gradient-based methods can produce noisy maps or miss important features in saturated regions.
- Use multiple methods (ISM, DeepLIFT, IG) and check for consistency.
Shortcut features
- Models may rely on dataset-specific artifacts (e.g., barcode k-mers, GC content) that produce clean motifs but are not mechanistically meaningful.

Recommended practices:

Sanity checks
- Randomize model weights: attributions should degrade to noise.
- Randomize labels: derived motifs should disappear or lose predictive power.
Counterfactual tests
- Delete or scramble high-attribution regions and confirm that predictions drop accordingly.
- Insert discovered motifs into neutral backgrounds to test gain-of-function effects.
Benchmarking interpretability methods
- Use synthetic datasets with known ground-truth grammar.
- Compare methods on their ability to recover planted motifs and interactions.

17.9 A Practical Interpretability Toolbox for Genomic Foundation Models

For practitioners working with genomic foundation models (GFMs) and their fine-tuned derivatives, a practical toolbox might include:

Local effect estimation
- For variant effect prediction: use ref/alt scoring and small-window ISM around variants.
- Aggregate per-base attributions into per-variant or per-motif scores.
Motif and grammar discovery
- Compute base-level attributions for high-confidence predictions.
- Run TF-MoDISco or similar algorithms to build a motif vocabulary.
- Analyze motif grammars across tasks (e.g., multiple cell types or assays).
Global context visualization
- For transformer-based GFMs: inspect attention patterns to identify heads that track operons, gene neighborhoods, or enhancer–promoter loops.
- For models like Enformer: combine long-range attributions with contact maps to hypothesize regulatory architectures.
Regulatory vocabularies and embeddings
- Use frameworks like Sei to project sequences into a low-dimensional regulatory activity space.
- Cluster variants, enhancers, or genomic regions by their sequence-class profiles to reveal shared regulatory programs.
Model and dataset auditing
- Use interpretability tools to identify reliance on confounded or undesirable features.
- Cross-reference with Chapter 16’s confounder taxonomy (ancestry stratification, batch effects) to design deconfounded training and evaluation.
Human-in-the-loop analysis
- Integrate motif and sequence-class outputs into visualization tools (e.g., genome browsers with attribution tracks, motif tracks, and class scores).
- Enable domain experts to iteratively refine hypotheses.

17.10 Outlook: From Explanations to Mechanistic Models

Interpretability in genomic deep learning is evolving from post hoc explanation toward model-assisted mechanistic discovery:

Foundation models provide rich latent spaces and long-range context.
Attribution and motif discovery tools translate those representations into candidate regulatory grammars.
Global vocabularies like Sei’s sequence classes offer interpretable axes spanning thousands of assays.
Attention analysis in genomic language models reveals emergent gene-level organization, hinting at scalable ways to capture systems-level biology.

The next frontier is to close the loop:

Use insights from interpretability (motifs, grammars, sequence classes) to design better architectures and training objectives.
Feed experimentally validated grammars back into models as inductive biases.
Develop evaluation frameworks where success is measured not only by predictive accuracy but also by mechanistic fidelity—how well model-derived hypotheses align with the causal structure of regulatory biology.

In this sense, interpretability is not just a diagnostic for black-box models. It is a central tool for turning genomic foundation models into engines of biological discovery, capable of bridging the gap between sequence-level predictions and the mechanistic understanding that underpins robust clinical translation.

17.1 Why Interpretability Matters for Genomic Models

17.2 Interpreting Convolutional Filters as Motifs

17.2.1 From Filters to Motif Logos

17.2.2 Beyond First-Layer Filters

17.3 Attribution Methods: Connecting Bases to Predictions

17.3.1 In Silico Mutagenesis (ISM)

17.3.2 Gradient-Based Methods

17.3.2.1 Vanilla Gradient / Saliency

17.3.2.2 DeepLIFT

17.3.2.3 Integrated Gradients (IG)

17.4 From Attributions to Motifs: TF-MoDISco

17.4.1 Core Idea

17.5 Interpreting Attention and Long-Range Context

17.5.1 Genomic Language Models and Operon Structure (gLM)

17.5.2 Distal Regulatory Elements in Enformer-Like Models

17.6 Global Regulatory Vocabularies: Sei Sequence Classes

17.6.1 The Sei Framework

17.6.2 Interpretation and Applications

17.7 Case Study: From Base-Pair Attributions to Regulatory Grammar

17.8 Evaluating Interpretations: Faithfulness vs Plausibility

17.9 A Practical Interpretability Toolbox for Genomic Foundation Models

17.10 Outlook: From Explanations to Mechanistic Models