21 3D Genome Organization
Two meters of DNA. Ten micrometers of space. How it folds determines what it does.
Estimated reading time: 25-35 minutes
Prerequisites: This chapter builds on the regulatory models introduced in Chapter 17, particularly Enformer’s approach to sequence-based prediction. Familiarity with convolutional architectures (Chapter 6) and the concept of dilated convolutions is helpful. Understanding of chromatin accessibility and histone modifications (Section 2.4) provides useful context for interpreting 3D genome features.
Learning Objectives: After completing this chapter, you should be able to:
- Explain the hierarchy of 3D genome organization from chromosome territories to fine-scale loops
- Describe the loop extrusion mechanism and predict which CTCF sites will anchor chromatin loops
- Compare and contrast Akita, Orca, and C.Origami for 3D structure prediction
- Interpret Hi-C contact matrices and understand their resolution limitations
- Explain why structural variant pathogenicity depends on 3D context, not just sequence
- Distinguish between correlational 3D structure predictions and causal regulatory effects
Key Insight: The 3D genome provides context that one-dimensional sequence models cannot capture, but 3D structure is permissive rather than deterministic. A predicted enhancer-promoter contact indicates that interaction could happen, not that it does happen or that it matters when it does.
The human genome spans approximately two meters of linear DNA, yet it must fit within a nucleus roughly ten micrometers in diameter: a compaction ratio of nearly 200,000 to one. This folding is not random. Specific sequences contact each other across vast genomic distances while others remain isolated, and these contact patterns determine which enhancers can activate which genes. An enhancer 500 kilobases from its target can drive transcription only because intervening chromatin folds to bring them into physical proximity. The regulatory models covered in Chapter 17 predict expression from sequence within a fixed window, building on the convolutional architectures introduced in Chapter 6 and treating the genome as a one-dimensional string. They cannot explain why an enhancer activates one gene and not another when multiple promoters lie within range.
Disruptions to 3D genome architecture cause disease through mechanisms that sequence alone cannot predict. When structural variants delete a boundary between chromatin domains, enhancers can contact genes they normally never reach, a phenomenon called enhancer hijacking that underlies developmental disorders and cancer. The clinical consequences depend entirely on which contacts are disrupted. A deletion that removes a domain boundary may be pathogenic; an identical-sized deletion preserving boundaries may be benign. Current variant effect prediction tools (Chapter 18) largely ignore this spatial dimension, creating systematic blind spots for structural variant interpretation.
21.1 Chromatin Organization Hierarchy
The genome folds through multiple organizational levels, each with distinct functional consequences and arising from different molecular mechanisms. Understanding this hierarchy is essential for interpreting both normal gene regulation and how structural variants cause disease. The levels are not independent; they interact in complex ways that computational models must capture to predict 3D structure accurately.
Why did cells evolve multiple organizational levels rather than a single mechanism? Each level solves a different regulatory problem at a different scale. Compartments segregate active from inactive chromatin, creating nuclear microenvironments with distinct biochemical properties, essentially partitioning the nucleus into “transcription factories” and “silencing zones.” TADs constrain enhancer-promoter search: without boundaries, an enhancer might contact any gene within megabases, creating regulatory chaos; TAD boundaries limit this search to a few hundred kilobases. Loops bring specific regulatory elements into direct contact when needed. This hierarchical division of labor allows cells to achieve both genome-scale organization (compartments) and fine-scale precision (loops) using different mechanisms optimized for different tasks.
Before examining the table below, try to rank the four organizational levels (chromosome territories, compartments, TADs, and loops) from most to least computationally tractable to predict from DNA sequence. What sequence features might make some levels easier to predict than others?
The following table summarizes the key organizational levels, their scales, and the molecular mechanisms underlying each:
| Level | Scale | Key Features | Molecular Mechanism | Computational Tractability |
|---|---|---|---|---|
| Chromosome territories | Nucleus-wide | Gene-rich toward interior | Nuclear organization | Low (difficult to predict) |
| A/B compartments | 1–10 Mb | Checkerboard pattern in Hi-C | Phase separation | Moderate (chromatin state) |
| TADs | 200 kb–2 Mb | Triangular domains, conserved boundaries | Loop extrusion | High (CTCF motifs) |
| Chromatin loops | 10–500 kb | Focal Hi-C enrichments | CTCF/cohesin | High (convergent motifs) |
21.1.1 Chromosome Territories and Compartments
At the largest scale, chromosomes occupy distinct nuclear volumes called chromosome territories. Gene-rich chromosomes tend toward the nuclear interior while gene-poor chromosomes associate with the nuclear periphery. This territorial organization limits which chromosomes can exchange material during translocations: recurrent cancer-associated translocations occur preferentially between chromosomes that occupy neighboring territories (Zhang et al. 2012). While chromosome territory organization has clear functional implications, most computational models focus on finer-scale structures where sequence determinants are more tractable.
Within chromosome territories, chromatin partitions into two major compartment types distinguished by their transcriptional activity and chromatin state. A compartments contain gene-rich, transcriptionally active chromatin with open, accessible structure. B compartments contain gene-poor, transcriptionally silent regions often associated with the nuclear lamina at the nuclear periphery. This compartmentalization is visible in Hi-C contact maps as a characteristic checkerboard pattern: A compartment regions preferentially contact other A regions even when separated by megabases, while B regions contact other B regions (Lieberman-Aiden et al. 2009). Compartment identity correlates strongly with histone modifications (H3K27ac marks active A compartments; H3K9me3 marks repressive B compartments) and changes during cellular differentiation as lineage-specific genes shift between active and inactive states. The molecular mechanism underlying compartmentalization appears to involve phase separation: regions with similar chromatin states aggregate through weak multivalent interactions, creating nuclear microenvironments with distinct biochemical properties (Larson et al. 2017).
Hi-C contact maps encode all levels of chromatin organization in a single visualization:
- Chromosome territories appear as distinct diagonal blocks; contacts are enriched within chromosomes compared to between chromosomes.
- A/B compartments create a checkerboard pattern at megabase scale: A regions preferentially contact other A regions (appearing as enriched off-diagonal blocks), while B regions contact B regions, creating alternating stripes.
- TADs appear as triangular domains along the diagonal. Within each triangle, contact frequencies are elevated compared to regions outside the triangle. TAD boundaries are the sharp edges where these triangles meet.
- Chromatin loops appear as focal enrichments (bright spots) at specific off-diagonal positions, representing direct contacts between loop anchors (typically convergent CTCF sites).
The strong diagonal signal reflects the polymer effect: sequences close in linear distance contact frequently simply because they are tethered to the same DNA strand. Biologically meaningful contacts appear as enrichments above this baseline.
21.1.2 Topologically Associating Domains
Below the megabase scale of compartments, the genome organizes into topologically associating domains (TADs): sub-megabase regions (median approximately 800 kilobases in mammals) within which sequences contact each other more frequently than with sequences outside the domain. TAD boundaries appear as sharp transitions in contact frequency, visible in Hi-C maps as triangular domains along the matrix diagonal. These boundaries show strong conservation across mammalian species and across cell types within a species, suggesting strong selective pressure to maintain domain organization (Dixon et al. 2012). The prevailing model holds that TADs constrain enhancer-promoter interactions: regulatory elements within a TAD can contact genes in the same domain, but boundaries prevent crosstalk with genes in adjacent domains. This insulation function has clear clinical relevance. Deletions that remove TAD boundaries allow enhancers to contact genes they normally cannot reach. In a well-characterized example, deletions removing the boundary between the EPHA4 locus and the WNT6/PAX3 region allow limb enhancers to ectopically activate WNT6, causing brachydactyly and other limb malformations (Lupiáñez et al. 2015).
Consider an enhancer located exactly at the center of a TAD. If this TAD contains three genes (one near each boundary and one near the center), which genes would you expect the enhancer to regulate? How would your prediction change if the enhancer were located near one of the TAD boundaries instead?
Center-located enhancer: All three genes within the TAD could potentially be regulated because TAD boundaries constrain contacts between domains, not within them. The central enhancer could contact all three genes through chromatin looping, though the nearest gene (at the center) would likely show strongest contact frequency due to proximity.
Boundary-located enhancer: This is the more interesting case. TAD boundaries are not perfect insulators. An enhancer near a boundary might: (1) preferentially contact genes within its own TAD, (2) occasionally “leak” to contact genes in the adjacent TAD (TAD boundaries are statistical rather than absolute), or (3) if positioned at an anchor point, be involved in structural loops rather than regulatory contacts.
The key insight is that TADs create statistical preferences for contact: enhancers within a TAD are more likely to contact genes in the same TAD, but boundary effects create gradients rather than hard walls.
21.1.3 Loop Extrusion Mechanism
Before reading: CTCF proteins bind to DNA in a specific orientation. What do you predict happens when two CTCF sites face toward each other (convergent) versus away from each other (divergent)? Which orientation would you expect to form stable chromatin loops?
The molecular basis of TAD formation is now well understood through the loop extrusion model. Think of cohesin as a ring sliding along a rope: once threaded, it moves along the rope, gathering slack into a growing loop. The ring slides freely until it hits a knot tied in a specific direction; only knots facing toward the ring block its progress, while knots facing away let it pass. The cohesin protein complex loads onto chromatin and extrudes DNA bidirectionally, progressively enlarging the extruded loop until it encounters an obstacle. The key obstacle is CTCF protein bound to DNA in a specific orientation. When cohesin encounters CTCF sites oriented toward each other (convergent orientation), extrusion halts and a stable loop forms with the convergent CTCF sites at the loop anchors (Sanborn et al. 2015; Rao et al. 2014). This model explains several key observations: TAD boundaries are enriched for CTCF binding sites; CTCF motif orientation predicts which sites will anchor loops (convergent pairs form loops while divergent pairs do not); and acute degradation of cohesin eliminates TADs within hours while leaving compartments intact (Rao et al. 2017). The distinction between compartment and TAD formation mechanisms has important implications for prediction. Models that capture CTCF binding and orientation can predict TAD boundaries; predicting compartments requires learning different sequence features associated with chromatin state.
The orientation rule for CTCF is a powerful predictor: convergent CTCF pairs (pointing toward each other: \(\rightarrow \leftarrow\)) form stable loop anchors, while divergent pairs (\(\leftarrow \rightarrow\)) and tandem pairs (\(\rightarrow \rightarrow\)) do not. This simple rule, which arises from the loop extrusion mechanism, allows computational models to predict many TAD boundaries directly from sequence.
21.1.4 Fine-Scale Chromatin Loops
At the finest scale, chromatin forms specific loops between defined loci. Enhancer-promoter loops bring distal regulatory elements into physical proximity with their target genes, while structural loops between convergent CTCF sites establish the TAD framework. Most enhancer-promoter contacts span less than 200 kilobases, but some extend over a megabase (Rao et al. 2014). Detecting these fine-scale contacts requires high-resolution data; the Micro-C method uses micrococcal nuclease digestion to achieve nucleosome-level resolution, revealing contact patterns invisible in standard Hi-C (Hsieh et al. 2020). The functional significance of individual loops remains debated. Some loops appear essential for gene activation; others may be structural features without direct regulatory consequences.
21.2 Measuring the 3D Genome
Predicting 3D genome structure requires training data: measurements of which sequences contact which other sequences in real cells. Chromosome conformation capture methods provide these measurements through a common biochemical principle, though the technologies vary in resolution, throughput, and the aspects of 3D organization they reveal.
21.2.1 Hi-C and Contact Matrices
This section describes the biochemistry and normalization of Hi-C data. Understanding these technical details helps explain why training data resolution varies and why certain biases must be corrected. Readers primarily interested in computational prediction can focus on the key point: Hi-C produces a symmetric contact matrix where values represent how frequently two genomic regions were spatially close in the cell population.
Cells are crosslinked with formaldehyde to freeze chromatin contacts in place; DNA is digested with restriction enzymes; free DNA ends are ligated, preferentially joining fragments that were spatially proximate; and the ligated junctions are identified through sequencing. The frequency of junction reads between two genomic regions reflects how often those regions were in physical contact across the cell population.
Hi-C extends this principle genome-wide by incorporating biotinylated nucleotides at ligation junctions, enabling purification of chimeric fragments from the entire genome (Lieberman-Aiden et al. 2009). The output is a contact matrix—similar to a table showing how often people from different neighborhoods meet each other in a city. Just as residents of the same neighborhood run into each other frequently while encounters between distant neighborhoods are rare, genomic regions close together contact often while distant regions rarely meet. Rows and columns represent genomic bins (typically 1 to 50 kilobases depending on sequencing depth) and values represent contact frequencies between bin pairs. Resolution depends directly on sequencing depth: achieving 1 kilobase resolution requires billions of reads, while 10 kilobase resolution requires hundreds of millions. Raw contact frequencies require extensive normalization to correct for biases from GC content, restriction site density, and mappability. The ICE (iterative correction and eigenvector decomposition) method and related approaches remove these technical artifacts while preserving biological signal. ICE works by assuming that all genomic regions should have roughly equal total contact frequency (visibility), then iteratively adjusting row and column sums to achieve this balance. The assumption that visibility differences are technical rather than biological is imperfect but works well in practice because the dominant technical biases (GC content affecting PCR amplification, restriction site density affecting fragmentation) would otherwise dwarf genuine biological variation. The training strategies that enable models to learn from these normalized contact matrices follow the multi-task principles introduced in Section 8.6.
The contact matrix encodes all levels of chromatin organization. Compartments appear as the checkerboard pattern when viewing megabase-scale interactions; TADs appear as triangular domains of enriched contacts along the diagonal; and loops appear as focal enrichments at specific off-diagonal positions. The matrix is dominated by the polymer effect: sequences that are close in linear distance contact each other frequently regardless of specific 3D structure, creating strong signal along the diagonal that can obscure biologically meaningful contacts at greater distances. Why does genomic distance dominate contact frequency? Chromatin behaves as a polymer: random thermal fluctuations bring nearby sequences into contact simply because they are tethered to the same chain. Two loci 10 kb apart will contact each other frequently by chance, while loci 10 Mb apart rarely meet through random motion. This distance decay follows a power law (contact frequency ~ distance-1), and deviations from this baseline reveal the biologically interesting contacts: TAD boundaries where contact drops faster than expected, and loops where contact is enriched above the polymer baseline.
21.2.2 Resolution and Data Resources
Recall from Section 2.4 how different sequencing technologies involve tradeoffs between throughput, resolution, and cost. Before examining the table below, predict: which 3D genome technology would you expect to have the highest resolution but lowest throughput? Which would have the opposite characteristics?
Beyond standard Hi-C, several technologies address specific limitations:
| Technology | Resolution | Throughput | Key Advantage | Key Limitation |
|---|---|---|---|---|
| Hi-C | 1–50 kb | High | Genome-wide, well-established | Limited by restriction sites |
| Micro-C | ~100 bp | Moderate | Nucleosome-level resolution | Higher cost, fewer datasets |
| Single-cell Hi-C | Variable | Low | Cell-to-cell variation | Extremely sparse matrices |
| DNA FISH | Single locus | Low | Direct visualization | Low throughput |
| Capture Hi-C | 1–5 kb | Moderate | High resolution at targets | Limited to predetermined loci |
Micro-C achieves nucleosome-level resolution by using micrococcal nuclease instead of restriction enzymes, revealing fine-scale contact patterns invisible at standard Hi-C resolution. Single-cell Hi-C measures contacts in individual cells, revealing that any two loci contact each other in only a small fraction of cells, but the resulting matrices are extremely sparse (most possible contacts are unmeasured in any single cell). Why is single-cell Hi-C so sparse? The math is humbling: each chromosome makes at most one contact per restriction fragment per cell (a fragment cannot be ligated to two partners simultaneously). With millions of possible genomic bin pairs but only thousands of ligation events captured per cell, the sampling is fundamentally incomplete. The sparsity is not a technical limitation that better sequencing will solve; it reflects the physical constraint that each genomic region can only contact one partner at a time in any given cell. Imaging methods such as DNA FISH directly visualize genomic loci in the nucleus, providing ground truth for computational predictions but at much lower throughput than sequencing-based approaches.
Training data for 3D prediction models comes primarily from a small number of well-characterized cell lines. The lymphoblastoid cell line GM12878 and the leukemia cell line K562 have deep Hi-C coverage across multiple laboratories, making them the default training sets for most models. Primary tissues and rare cell types have sparse coverage, creating a significant gap between where models are trained and where clinical applications require predictions. The 4D Nucleome Data Portal and ENCODE provide the most comprehensive repositories of 3D genome data, though coverage remains heavily biased toward common cell lines and human samples. This data landscape parallels the challenges discussed for functional genomics data more broadly (Chapter 2).
A researcher wants to study enhancer-promoter contacts at a specific gene locus implicated in a rare disease. The target region spans 500 kb. Given the technologies described above, which approach would you recommend, and why? What would be the key limitation of your recommended approach?
Capture Hi-C would be the best choice: it provides high resolution (1-5 kb) at the specific target locus, enabling identification of enhancer-promoter contacts within the 500 kb region without the cost of genome-wide sequencing. The key limitation is that it only captures contacts within the predetermined region, potentially missing long-range interactions with sequences outside the targeted 500 kb window.
21.3 Predicting 3D Structure from Sequence
Sequence-based prediction of 3D genome structure asks whether DNA sequence alone contains sufficient information to predict chromatin contacts. The success of models like Akita, Orca, and C.Origami demonstrates that sequence encodes substantial 3D information, particularly for TAD boundaries and CTCF-anchored loops. These models share a common challenge: predicting a two-dimensional contact matrix from a one-dimensional sequence input.
21.3.1 Akita and Dilated Convolutions
Recall from Chapter 6 how dilated convolutions expand receptive fields. Why is this architectural choice particularly important for 3D genome prediction, where CTCF binding sites may be separated by hundreds of kilobases? What would happen if we used standard convolutions instead?
Akita, introduced by Fudenberg et al. in 2020, established the paradigm for sequence-to-contact prediction (Fudenberg, Kelley, and Pollard 2020). The model takes approximately one megabase of DNA sequence as input and predicts Hi-C contact frequencies at 2 kilobase resolution. The architecture uses dilated convolutions to expand the receptive field without proportionally increasing parameters (an approach discussed in detail in Section 6.5.1), enabling the model to integrate information across the full input window. Dilated convolutions are essential here because TAD boundaries and loop anchors depend on sequence features (primarily CTCF motifs) that may be separated by hundreds of kilobases, far beyond what standard convolutions could capture without prohibitive parameter counts.
The output is symmetric (contacts between positions i and j equal contacts between j and i), which the architecture enforces through appropriate pooling operations. This symmetry constraint is not merely a convenience but reflects the physical reality of chromatin contacts: if region A touches region B, then by definition B touches A. Enforcing this constraint in the architecture prevents the model from learning spurious asymmetric predictions that would violate physics. Akita achieves correlation coefficients of 0.6 to 0.8 between predicted and observed contact maps in held-out genomic regions, successfully identifying TAD boundaries and major loop anchors.
21.3.2 Orca and Multiscale Prediction
Orca extends sequence-based prediction to multiple resolutions simultaneously (Zhou 2022). Rather than predicting a single-resolution contact map, Orca generates predictions at 4, 8, 16, 32, 64, 128, and 256 kilobase resolutions, capturing both fine-scale loops and large-scale compartment structure. The multiscale approach addresses a fundamental challenge: compartments span megabases while loops span kilobases, and no single resolution optimally captures both. A single-resolution model forced to compromise would either miss fine-scale loop contacts (at coarse resolution) or fail to capture compartment patterns that emerge from aggregating many weak interactions (at fine resolution).
Think back to Enformer from Chapter 17, which uses dilated convolutions to expand its receptive field. How might Orca’s multiscale approach differ from simply using more extreme dilation rates? What architectural advantage does explicit multi-resolution prediction provide?
Orca’s architecture processes sequence through parallel pathways tuned to different scales, then combines predictions into a coherent multiscale representation. The parallel pathways use different pooling and dilation strategies, effectively asking “what does this sequence predict at the 4kb scale?” independently from “what does it predict at the 256kb scale?” This design enables prediction of structural variants’ effects across organizational levels, from disrupted loops to altered compartment boundaries.
21.3.3 C.Origami and Cross-Cell-Type Transfer
C.Origami addresses the cell-type specificity problem (Tan et al. 2023). While TAD boundaries are largely conserved across cell types, finer-scale contacts vary substantially. C.Origami incorporates CTCF ChIP-seq data alongside sequence, enabling the model to learn how cell-type-specific CTCF binding patterns shape cell-type-specific contact maps. This design enables transfer learning: train on cell types with both Hi-C and CTCF data, then predict contacts in new cell types using only CTCF ChIP-seq. The approach substantially expands the range of cell types where 3D predictions are possible, since CTCF ChIP-seq is available for many more cell types than deep Hi-C. This transfer strategy echoes the broader transfer learning principles discussed in Chapter 9.
Before examining the comparison table, consider the tradeoffs each model makes. Which model would you expect to have the best cross-cell-type performance? Which would be most suitable for predicting the effects of a novel structural variant in a cell type with no available Hi-C data?
The following table compares these three foundational 3D prediction models:
| Feature | Akita | Orca | C.Origami |
|---|---|---|---|
| Input | Sequence only | Sequence only | Sequence + CTCF ChIP-seq |
| Context length | ~1 Mb | ~1 Mb | ~1 Mb |
| Output resolution | 2 kb | 4–256 kb (multiscale) | Variable |
| Architecture | Dilated CNN | Multiscale CNN | CNN with auxiliary input |
| Cell-type transfer | Limited | Limited | Enabled via CTCF data |
| Key strength | Established paradigm | Multiscale prediction | Cross-cell-type transfer |
| Key limitation | Single cell type | Requires Hi-C training | Requires CTCF ChIP-seq |
Before proceeding, ensure you can explain:
- The tradeoff between sequence-only models (Akita) and models requiring auxiliary data (C.Origami)
- Why multiscale prediction (Orca) captures different structural features than fixed-resolution models
- When cross-cell-type transfer is possible and what data it requires
If these distinctions are unclear, re-read the model descriptions and table above.
Consider the tradeoff between Akita’s sequence-only approach and C.Origami’s requirement for CTCF ChIP-seq data. In what scenarios would each approach be preferred? Think about (1) predicting structural variant effects, (2) predicting contacts in a novel cell type, and (3) understanding what sequence features determine 3D structure.
21.3.4 Learned Sequence Determinants
Before reading about what these 3D prediction models learn, recall the loop extrusion mechanism from earlier in this chapter. What sequence feature would you expect to be the strongest predictor of chromatin loop formation? Why would the orientation of this feature matter?
Interpretability analysis reveals what these models learn about sequence determinants of 3D structure. Attribution methods (discussed more fully in Chapter 25) consistently identify CTCF motifs as the strongest predictors of contact patterns, with convergent CTCF pairs (motifs oriented toward each other) most strongly associated with loop anchors. Transcription start sites contribute to boundary predictions, consistent with the observation that active promoters often coincide with domain edges. GC content correlates with compartment identity (GC-rich regions tend toward A compartment), and repetitive element composition shows systematic associations (LINE elements with B compartment; Alu elements with A compartment). The orientation rule for CTCF emerges naturally from training: models learn that CTCF motif orientation, not just presence, predicts which sites will anchor loops. This learned relationship matches the mechanistic understanding from the loop extrusion model, providing validation that models capture biologically meaningful features.
Despite these advances, significant limitations remain. Resolution is constrained by training data; predicting nucleosome-level contacts requires Micro-C training data that exists for few cell types. Unlike Hi-C, which is limited by restriction enzyme cutting sites (typically every few kilobases), Micro-C uses micrococcal nuclease to digest chromatin between nucleosomes, achieving approximately 100 base pair resolution, roughly 10 to 50 times finer than standard Hi-C. However, this improved resolution comes at a cost: Micro-C requires substantially deeper sequencing to achieve comparable genome-wide coverage, and far fewer cell types have been profiled at the depth needed for training contact prediction models. The single-cell variation problem is fundamental: models trained on bulk Hi-C predict population averages, but gene regulation may depend on the stochastic 3D configurations in individual cells. Causality cannot be established from prediction alone; a model may correctly predict that two regions contact each other without revealing whether that contact causes any functional consequence. Generalization to cell types distant from training data remains uncertain, and the computational cost of processing megabase sequences limits practical applications for genome-wide analysis.
21.4 3D Structure and Gene Regulation
The ultimate purpose of 3D genome prediction is understanding gene regulation. Contact maps matter because they reveal which enhancers can reach which genes. Integrating 3D structure with expression prediction addresses limitations that purely one-dimensional models cannot overcome.
21.4.1 Beyond One-Dimensional Models
Enformer (Section 17.2) predicts gene expression from sequence within a 200 kilobase window, sufficient to capture many enhancer-promoter relationships but fundamentally limited by its treatment of the genome as a one-dimensional string. This representation cannot distinguish an enhancer that loops to a distant gene from one blocked by a TAD boundary, nor can it explain cell-type-specific contacts that activate different genes from the same enhancer in different contexts. The 3D genome provides this missing context: physical proximity through chromatin loops determines which regulatory elements can communicate.
Consider an enhancer located 300 kilobases from two genes, one upstream and one downstream. Linear models would predict similar regulatory influence on both genes based on comparable distances. But if a TAD boundary lies between the enhancer and the upstream gene, 3D structure predicts that only the downstream gene receives regulatory input. The boundary insulates the upstream gene from enhancer activity regardless of linear proximity. This insulation function explains why TAD boundaries show such strong evolutionary conservation: disrupting boundaries allows regulatory crosstalk that can dysregulate gene expression with pathogenic consequences.
Linear distance is not regulatory distance. An enhancer 500 kb away within the same TAD may have stronger regulatory influence than an enhancer 50 kb away across a TAD boundary. This principle explains why structural variants can be pathogenic even when they do not disrupt any coding sequence: they rewire the regulatory topology.
21.4.2 Structural Variant Interpretation
The clinical significance is clearest in structural variant interpretation. Deletions that remove TAD boundaries cause enhancer hijacking, where regulatory elements gain access to genes in adjacent domains. The EPHA4 locus provides the canonical example: limb enhancers normally activate EPHA4 expression in developing limbs. When deletions remove the TAD boundary separating EPHA4 from the adjacent WNT6/PAX3 domain, these enhancers ectopically activate WNT6, causing limb malformations including brachydactyly and polydactyly (Lupiáñez et al. 2015). Different deletion sizes produce different phenotypes depending on which boundaries are removed and which new enhancer-gene contacts form. Similar mechanisms operate in cancer, where structural variants create novel enhancer-oncogene contacts that drive tumor growth. The diagnostic challenge is substantial: predicting pathogenicity of structural variants requires understanding which 3D contacts will be disrupted and what new contacts will form, predictions that sequence-only models cannot provide. This challenge intersects with the variant prioritization pipelines discussed in Chapter 29, where 3D genome effects represent a systematic blind spot in current foundation model approaches to variant effect prediction (Chapter 18).
When analyzing structural variants for potential pathogenicity:
- Identify nearby TAD boundaries using available Hi-C data or boundary predictions from Akita/Orca
- Check for boundary disruption: Does the structural variant delete, invert, or translocate a boundary?
- Inventory regulatory elements: What enhancers exist in the affected region? What genes lie in adjacent TADs?
- Consider tissue specificity: The consequence depends on which cell types express the relevant enhancers and target genes
- Compare to known cases: Databases of pathogenic structural variants with 3D mechanism provide precedent
Structural variants that disrupt boundaries are more likely to be pathogenic than those preserving domain structure, even if the latter remove more sequence.
Integrating 3D predictions with expression models remains technically challenging. Hybrid approaches use predicted contacts to weight enhancer contributions: rather than treating all enhancers within a window equally, weights reflect predicted contact frequency with the target promoter. This activity-by-contact framework (expression proportional to the sum of enhancer activities weighted by contact frequencies) captures some of the regulatory logic that 1D models miss. Graph-based representations (Chapter 22) can encode genes and enhancers as nodes with contacts as edges, enabling graph neural networks to reason about regulatory relationships in 3D space. Attribution methods for understanding which contacts drive expression predictions are examined in Section 25.1. End-to-end training of combined 3D and expression models remains difficult; most current approaches train the components separately and combine predictions post hoc.
21.4.3 Causality and Permissive Architecture
The causality question complicates interpretation. Do enhancer-promoter contacts cause gene activation, or does gene activation cause contacts? Transcription itself can influence chromatin organization: active transcription may stabilize enhancer-promoter contacts that would otherwise be transient. Perturbation experiments provide cleaner causal tests than correlational analysis. Acute degradation of cohesin eliminates TADs within hours, yet most genes show minimal expression changes, suggesting that many TAD structures are permissive rather than deterministic for gene regulation. CRISPR-based deletion of specific TAD boundaries similarly produces more modest effects than the structural disruption would suggest. The emerging view is nuanced: 3D structure constrains which enhancer-promoter interactions are possible, but whether those interactions occur depends on additional factors including transcription factor availability and chromatin state. This distinction between correlation and causation echoes the confounding challenges discussed in Chapter 13 and the causal inference principles explored in Chapter 26.
A recent paper reports that deleting a TAD boundary experimentally results in new chromatin contacts between an enhancer and a previously insulated gene, but the gene’s expression does not change. How would you interpret this result? What additional experiments would help distinguish between (a) the contact is non-functional, (b) the contact is functional but opposed by other regulatory mechanisms, and (c) the contact requires additional factors not present in this experimental system?
21.5 Spatial Transcriptomics
Single-cell RNA sequencing (Chapter 20) reveals cellular heterogeneity but discards spatial information: we learn which genes each cell expresses but not where that cell sits within the tissue. For understanding tumor microenvironments, developmental gradients, or tissue architecture, spatial context is essential. A T cell adjacent to a tumor cell experiences a different microenvironment than one in the surrounding stroma, and this spatial context shapes gene expression programs in ways that dissociated single-cell data cannot capture.
21.5.1 Measurement Technologies
Spatial transcriptomics technologies fall into two broad categories with complementary strengths. Spot-based methods like Visium (10x Genomics) capture polyadenylated RNA at arrayed positions on a slide, providing transcriptome-wide measurement at approximately 55 micrometer resolution (typically 1 to 10 cells per spot). These methods offer comprehensive gene coverage but limited spatial resolution. Imaging-based methods like MERFISH use sequential rounds of fluorescent hybridization to identify RNA molecules in situ, achieving subcellular resolution but limited to pre-selected gene panels (hundreds to thousands of genes rather than transcriptome-wide). Newer technologies like Stereo-seq achieve near-cellular resolution with transcriptome-wide coverage through spatial barcoding, though they remain less validated than established methods.
Consider the fundamental tradeoff between spatial resolution and gene coverage. Why might this tradeoff exist at a technical level? Which approach would you choose for (1) discovering novel spatial patterns in a tissue, versus (2) mapping known cell-cell interactions at high resolution?
| Approach | Resolution | Gene Coverage | Example Technologies | Best For |
|---|---|---|---|---|
| Spot-based | ~55 um (multi-cell) | Transcriptome-wide | Visium, Slide-seq | Discovery, whole-transcriptome |
| Imaging-based | Subcellular | 100–10,000 genes | MERFISH, Xenium | Targeted, single-cell resolution |
| Next-generation | Near-cellular | Transcriptome-wide | Stereo-seq, Seq-Scope | Emerging applications |
21.5.2 Computational Challenges
Computational challenges in spatial transcriptomics mirror and extend those in single-cell analysis (Chapter 20). Spot deconvolution addresses the multiple-cells-per-spot problem in Visium data: inferring the cell type composition within each spot by comparing spot expression profiles to reference single-cell atlases. Imputation methods predict expression of genes not measured in imaging-based assays, leveraging correlations learned from reference datasets. Integration aligns spatial data with single-cell references, mapping reference cell types onto spatial coordinates. Domain correction handles batch effects that manifest in spatial patterns as well as expression levels. The sparsity problem is even more severe than in standard single-cell RNA sequencing; gene detection rates in spatial methods often fall below 10 percent . The missing modality strategies developed for multi-omics integration (Section 23.6) become essential when spatial methods fail to detect genes that single-cell RNA-seq measures reliably.
21.5.3 Spatial Foundation Models
Spatial foundation models remain much less mature than sequence-based models (Chapter 15, Chapter 16). The fundamental challenge is the lack of an equivalent to evolutionary pretraining: DNA and protein models learn from billions of years of evolutionary experiments encoded in sequence databases, but no comparable natural augmentation exists for spatial organization. Current approaches include graph neural networks that encode spatial relationships as edges between neighboring cells or spots, transformer architectures that treat spatial positions as tokens with positional encodings derived from coordinates, and generative models that learn spatial patterns from atlases of reference tissues. Models like Nicheformer apply transformer architectures to spatial niches (local cellular neighborhoods), learning representations that capture cell-cell communication patterns and tissue microenvironment signatures . SpaGCN uses graph convolutional networks with spatial graphs, propagating information between spatially adjacent regions to identify spatial domains with coherent expression patterns .
Other approaches address different aspects of the spatial modeling problem. CellPLM pretrains on millions of spatial transcriptomics cells, learning representations that transfer across tissue types and experimental platforms . STACI combines spatial coordinates with morphological features from histology images, enabling joint reasoning about molecular and visual tissue properties . GraphST uses graph attention networks to propagate expression signals across spatial neighborhoods while preserving local heterogeneity . HEIST (Madhu et al. 2025) employs a hierarchical graph transformer architecture that models tissue organization at multiple scales, capturing both local cell-cell communication patterns and broader tissue structure; whether such multi-scale spatial patterns improve downstream predictions remains an active area of validation. These methods remain early in development compared to sequence foundation models; no spatial equivalent of DNABERT or ESM-2 has achieved broad adoption, and benchmark comparisons across methods remain limited by the diversity of spatial platforms and tissue types.
The clinical applications motivating spatial foundation model development center on tumor microenvironment characterization. The spatial organization of immune cells relative to tumor cells predicts treatment response: tumors with immune cells infiltrating the tumor core respond better to immunotherapy than those with immune exclusion at the tumor periphery . Spatial models aim to learn these prognostic patterns from training data, enabling prediction of treatment response from spatial organization alone. Similar applications exist in developmental biology (understanding morphogen gradients and cell fate decisions), neuroscience (mapping brain region organization), and pathology (characterizing disease architecture in tissue sections).
21.6 Limitations and Open Questions
Current 3D genome and spatial models face limitations that constrain their utility for clinical and research applications. Resolution remains a fundamental constraint: most Hi-C prediction models operate at 2 to 10 kilobase resolution, while functionally relevant enhancer-promoter contacts involve specific sequences within those bins. Predicting which specific kilobases within a TAD contact each other requires resolution that exceeds current training data in most cell types. The resolution needed for accurate prediction may exceed the resolution achievable from bulk Hi-C, creating a data ceiling that computational methods cannot overcome.
The population averaging problem is more fundamental than a mere technical limitation. Bulk Hi-C measurements average over millions of cells, each with a different 3D configuration. Any two loci contact each other in only a minority of cells at any given time, yet the averaged contact frequency appears as a single value in the training data. Single-cell Hi-C reveals this heterogeneity but produces extremely sparse data (most possible contacts unmeasured in each cell). Models trained on population averages cannot predict single-cell behavior, yet gene regulation may depend on the stochastic dynamics of contact formation in individual cells. Whether the population average or the single-cell distribution matters more for predicting gene expression remains unclear.
Causality represents the deepest conceptual challenge. Predicting that two regions contact each other does not establish that the contact causes any biological consequence. Many TAD disruptions produce minimal expression changes; many enhancer-promoter contacts may be bystanders rather than drivers of transcription. The loop extrusion machinery that creates TADs operates continuously, but the transcriptional machinery that reads out enhancer-promoter communication operates on different timescales and with different requirements. Computational predictions of 3D structure are correlational; establishing which predicted contacts matter functionally requires experimental validation that computational methods cannot replace.
For clinical applications, the sparse training data creates systematic blind spots. Models trained on GM12878 and K562 may not transfer to the primary cells, developmental stages, or disease states where predictions matter most. A structural variant affecting 3D organization in neural progenitor cells cannot be reliably interpreted using models trained only on lymphoblastoid cells. The cell types most relevant for clinical interpretation are often those with the least 3D characterization data available. This challenge parallels the transferability concerns discussed throughout Chapter 11 and Chapter 13.
Summarize three distinct limitations of current 3D genome prediction models. For each limitation, identify whether it is primarily (a) a data limitation that more sequencing could address, (b) a fundamental biological challenge, or (c) a computational/algorithmic limitation. How do these limitations affect the clinical utility of 3D structure predictions?
Three key limitations:
Resolution gap (data): models predict at 2-10 kb but enhancer-promoter contacts require finer resolution; more sequencing could help but may hit biological ceilings.
Population averaging (biological): bulk Hi-C averages over millions of cells with different configurations; single-cell approaches are fundamentally sparse due to physical constraints.
Causality (biological): predicting contacts does not establish functional consequences; many contacts are permissive rather than deterministic.
These severely limit clinical utility because disease-relevant cells lack training data and predicted contacts may not indicate functional effects.
21.7 Structure as Context, Not Cause
The genome’s three-dimensional organization provides context that one-dimensional sequence models cannot capture. Enhancer-promoter contacts explain regulatory relationships spanning hundreds of kilobases; TAD boundaries constrain which elements can interact; tissue architecture determines the cellular neighborhoods where gene expression programs execute. Models like Akita, Orca, and C.Origami demonstrate that sequence contains substantial information about chromatin folding, predicting contact maps from DNA sequence with accuracy sufficient to identify structural variants and disease-associated changes.
Yet the functional role of 3D structure remains more modest than early enthusiasm implied. Experimental perturbation studies show that TAD boundary disruption often has limited expression consequences . Many chromatin contacts appear permissive rather than instructive: they establish the possibility of regulatory communication without determining whether that communication occurs. A predicted enhancer-promoter contact indicates that interaction could happen, not that it does happen or that it matters when it does. The 3D genome may constrain the regulatory landscape without specifying regulatory outcomes.
3D structure is permissive, not deterministic. Think of TADs as enabling rather than commanding: they create the possibility of enhancer-promoter communication, but the actual communication requires transcription factors, chromatin accessibility, and other regulatory inputs. This is why disrupting a TAD boundary can cause disease (by enabling pathogenic new contacts) without boundary integrity being necessary for normal gene expression in most cases.
This distinction shapes how 3D structure should be integrated with other modalities. Chromatin contacts become edges in gene regulatory networks (Chapter 22), providing structural priors for graph-based reasoning. Spatial expression patterns integrate with multi-omics approaches (Chapter 23), adding tissue architecture alongside genomics and transcriptomics. For interpretability (Chapter 25), 3D structure offers mechanistic hypotheses that require experimental validation. Whether a predicted regulatory effect operates through chromatin proximity, or whether proximity merely correlates with regulation through shared causes, remains a question that computational models can motivate but not answer. The integration of 3D information into genomic AI proceeds with appropriate uncertainty about what that information contributes.
Before reviewing the summary, test your recall:
- What is the “orientation rule” for CTCF binding sites, and why does it determine which sites will anchor chromatin loops?
- Explain why “linear distance is not regulatory distance” in the context of enhancer-gene regulation.
- How does the loop extrusion mechanism create TAD boundaries?
- Why is 3D genome structure described as “permissive rather than deterministic” for gene regulation?
- What is the population averaging problem in Hi-C data, and why does it limit our understanding of gene regulation at the single-cell level?
CTCF orientation rule: Convergent CTCF pairs (→←) form stable loop anchors because the loop extrusion mechanism halts when cohesin encounters CTCF sites oriented toward each other. Divergent (←→) and tandem (→→) orientations do not block extrusion, so no stable loop forms.
Linear vs. regulatory distance: An enhancer 500 kb away within the same TAD may regulate a gene more strongly than an enhancer 50 kb away across a TAD boundary. The 3D folding determines regulatory proximity; boundaries insulate genes from enhancers despite short linear distances, while loops bring distant elements into contact.
Loop extrusion and TAD boundaries: Cohesin loads onto chromatin and extrudes DNA bidirectionally, enlarging a loop until encountering convergent CTCF sites. These CTCF-anchored loops define TAD boundaries. Multiple adjacent loops create the triangular TAD structure visible in Hi-C.
Permissive vs. deterministic: 3D structure establishes which enhancer-promoter interactions can occur, but whether they do occur depends on additional factors (transcription factors, chromatin accessibility). Many TAD disruptions produce minimal expression changes, showing that contacts enable but do not command gene regulation.
Population averaging problem: Bulk Hi-C averages over millions of cells, each with a different 3D configuration. Any two loci contact each other in only 5-15% of cells at any time, but Hi-C reports the average contact frequency. This obscures single-cell stochastic dynamics that may be critical for gene regulation but cannot be recovered from bulk data.
Key Concepts Covered:
- Chromatin organization hierarchy: Chromosome territories, A/B compartments, TADs, and fine-scale loops represent nested organizational levels with distinct mechanisms
- Loop extrusion model: Cohesin extrudes DNA until blocked by convergent CTCF sites, explaining TAD boundary formation
- Hi-C and contact matrices: Chromosome conformation capture methods measure 3D contacts; resolution depends on sequencing depth
- Sequence-based prediction: Akita, Orca, and C.Origami predict contact maps from sequence, achieving ~0.6-0.8 correlation with experimental data
- Structural variant interpretation: Boundary disruption causes enhancer hijacking with pathogenic consequences
- Spatial transcriptomics: Extends single-cell analysis to include tissue location, enabling microenvironment characterization
Core Takeaways:
- The 3D genome provides regulatory context that 1D models cannot capture; linear distance is not regulatory distance
- CTCF motif orientation is a powerful predictor of loop anchors: convergent pairs form loops, divergent pairs do not
- Sequence contains substantial 3D information, but prediction accuracy varies by organizational level (best for TAD boundaries, worst for compartments)
- 3D contacts are permissive rather than deterministic; contact predicts that regulation could occur, not that it does
- Clinical application to structural variants is limited by training data bias toward a few cell lines
Connections to Other Chapters:
- Builds on: Chapter 6 (dilated convolutions), Chapter 17 (Enformer and 1D models)
- Extends to: Chapter 22 (3D contacts as graph edges), Chapter 23 (spatial integration)
- Relevant evaluation: Chapter 11 (transfer evaluation), Chapter 25 (attribution methods)
- Clinical context: Chapter 18 (variant effect prediction gaps), Chapter 29 (structural variant interpretation)