21  3D Genome Organization

Two meters of DNA. Ten micrometers of space. How it folds determines what it does.

Chapter Overview

Estimated reading time: 25-35 minutes

Prerequisites: This chapter builds on the regulatory models introduced in Chapter 17, particularly Enformer’s approach to sequence-based prediction. Familiarity with convolutional architectures (Chapter 6) and the concept of dilated convolutions is helpful. Understanding of chromatin accessibility and histone modifications (Section 2.4) provides useful context for interpreting 3D genome features.

Learning Objectives: After completing this chapter, you should be able to:

  • Explain the hierarchy of 3D genome organization from chromosome territories to fine-scale loops
  • Describe the loop extrusion mechanism and predict which CTCF sites will anchor chromatin loops
  • Compare and contrast Akita, Orca, and C.Origami for 3D structure prediction
  • Interpret Hi-C contact matrices and understand their resolution limitations
  • Explain why structural variant pathogenicity depends on 3D context, not just sequence
  • Distinguish between correlational 3D structure predictions and causal regulatory effects

Key Insight: The 3D genome provides context that one-dimensional sequence models cannot capture, but 3D structure is permissive rather than deterministic. A predicted enhancer-promoter contact indicates that interaction could happen, not that it does happen or that it matters when it does.

The human genome spans approximately two meters of linear DNA, yet it must fit within a nucleus roughly ten micrometers in diameter: a compaction ratio of nearly 200,000 to one. This folding is not random. Specific sequences contact each other across vast genomic distances while others remain isolated, and these contact patterns determine which enhancers can activate which genes. An enhancer 500 kilobases from its target can drive transcription only because intervening chromatin folds to bring them into physical proximity. The regulatory models covered in Chapter 17 predict expression from sequence within a fixed window, building on the convolutional architectures introduced in Chapter 6 and treating the genome as a one-dimensional string. They cannot explain why an enhancer activates one gene and not another when multiple promoters lie within range.

Disruptions to 3D genome architecture cause disease through mechanisms that sequence alone cannot predict. When structural variants delete a boundary between chromatin domains, enhancers can contact genes they normally never reach, a phenomenon called enhancer hijacking that underlies developmental disorders and cancer. The clinical consequences depend entirely on which contacts are disrupted. A deletion that removes a domain boundary may be pathogenic; an identical-sized deletion preserving boundaries may be benign. Current variant effect prediction tools (Chapter 18) largely ignore this spatial dimension, creating systematic blind spots for structural variant interpretation.

Normal: enhancers contact EPHA4 within their TAD

Boundary deletion allows TAD merger

Ectopic WNT6 activation causes brachydactyly

Same-sized deletions have different outcomes
Figure 21.1: Clinical consequences of TAD boundary disruption. (A) Normal configuration: limb enhancers activate EPHA4 within their TAD while the boundary insulates WNT6 in the adjacent domain. (B) Boundary deletion allows TAD merger and new enhancer-gene contacts. (C) Pathogenic outcome: limb enhancers ectopically activate WNT6, causing brachydactyly. (D) Clinical interpretation: same-sized deletions may be pathogenic or benign depending on whether they disrupt domain boundaries, an effect current VEP tools cannot predict.

21.1 Chromatin Organization Hierarchy

The genome folds through multiple organizational levels, each with distinct functional consequences and arising from different molecular mechanisms. Understanding this hierarchy is essential for interpreting both normal gene regulation and how structural variants cause disease. The levels are not independent; they interact in complex ways that computational models must capture to predict 3D structure accurately.

Why did cells evolve multiple organizational levels rather than a single mechanism? Each level solves a different regulatory problem at a different scale. Compartments segregate active from inactive chromatin, creating nuclear microenvironments with distinct biochemical properties, essentially partitioning the nucleus into “transcription factories” and “silencing zones.” TADs constrain enhancer-promoter search: without boundaries, an enhancer might contact any gene within megabases, creating regulatory chaos; TAD boundaries limit this search to a few hundred kilobases. Loops bring specific regulatory elements into direct contact when needed. This hierarchical division of labor allows cells to achieve both genome-scale organization (compartments) and fine-scale precision (loops) using different mechanisms optimized for different tasks.

Predict Before You Look

Before examining the table below, try to rank the four organizational levels (chromosome territories, compartments, TADs, and loops) from most to least computationally tractable to predict from DNA sequence. What sequence features might make some levels easier to predict than others?

The following table summarizes the key organizational levels, their scales, and the molecular mechanisms underlying each:

Table 21.1: Summary of chromatin organization hierarchy showing spatial scales, key features, and prediction difficulty.
Level Scale Key Features Molecular Mechanism Computational Tractability
Chromosome territories Nucleus-wide Gene-rich toward interior Nuclear organization Low (difficult to predict)
A/B compartments 1–10 Mb Checkerboard pattern in Hi-C Phase separation Moderate (chromatin state)
TADs 200 kb–2 Mb Triangular domains, conserved boundaries Loop extrusion High (CTCF motifs)
Chromatin loops 10–500 kb Focal Hi-C enrichments CTCF/cohesin High (convergent motifs)

21.1.1 Chromosome Territories and Compartments

At the largest scale, chromosomes occupy distinct nuclear volumes called chromosome territories. Gene-rich chromosomes tend toward the nuclear interior while gene-poor chromosomes associate with the nuclear periphery. This territorial organization limits which chromosomes can exchange material during translocations: recurrent cancer-associated translocations occur preferentially between chromosomes that occupy neighboring territories (Zhang et al. 2012). While chromosome territory organization has clear functional implications, most computational models focus on finer-scale structures where sequence determinants are more tractable.

Within chromosome territories, chromatin partitions into two major compartment types distinguished by their transcriptional activity and chromatin state. A compartments contain gene-rich, transcriptionally active chromatin with open, accessible structure. B compartments contain gene-poor, transcriptionally silent regions often associated with the nuclear lamina at the nuclear periphery. This compartmentalization is visible in Hi-C contact maps as a characteristic checkerboard pattern: A compartment regions preferentially contact other A regions even when separated by megabases, while B regions contact other B regions (Lieberman-Aiden et al. 2009). Compartment identity correlates strongly with histone modifications (H3K27ac marks active A compartments; H3K9me3 marks repressive B compartments) and changes during cellular differentiation as lineage-specific genes shift between active and inactive states. The molecular mechanism underlying compartmentalization appears to involve phase separation: regions with similar chromatin states aggregate through weak multivalent interactions, creating nuclear microenvironments with distinct biochemical properties (Larson et al. 2017).

Reading Hi-C Contact Maps

Hi-C contact maps encode all levels of chromatin organization in a single visualization:

  • Chromosome territories appear as distinct diagonal blocks; contacts are enriched within chromosomes compared to between chromosomes.
  • A/B compartments create a checkerboard pattern at megabase scale: A regions preferentially contact other A regions (appearing as enriched off-diagonal blocks), while B regions contact B regions, creating alternating stripes.
  • TADs appear as triangular domains along the diagonal. Within each triangle, contact frequencies are elevated compared to regions outside the triangle. TAD boundaries are the sharp edges where these triangles meet.
  • Chromatin loops appear as focal enrichments (bright spots) at specific off-diagonal positions, representing direct contacts between loop anchors (typically convergent CTCF sites).

The strong diagonal signal reflects the polymer effect: sequences close in linear distance contact frequently simply because they are tethered to the same DNA strand. Biologically meaningful contacts appear as enrichments above this baseline.

21.1.2 Topologically Associating Domains

Below the megabase scale of compartments, the genome organizes into topologically associating domains (TADs): sub-megabase regions (median approximately 800 kilobases in mammals) within which sequences contact each other more frequently than with sequences outside the domain. TAD boundaries appear as sharp transitions in contact frequency, visible in Hi-C maps as triangular domains along the matrix diagonal. These boundaries show strong conservation across mammalian species and across cell types within a species, suggesting strong selective pressure to maintain domain organization (Dixon et al. 2012). The prevailing model holds that TADs constrain enhancer-promoter interactions: regulatory elements within a TAD can contact genes in the same domain, but boundaries prevent crosstalk with genes in adjacent domains. This insulation function has clear clinical relevance. Deletions that remove TAD boundaries allow enhancers to contact genes they normally cannot reach. In a well-characterized example, deletions removing the boundary between the EPHA4 locus and the WNT6/PAX3 region allow limb enhancers to ectopically activate WNT6, causing brachydactyly and other limb malformations (Lupiáñez et al. 2015).

Stop and Think

Consider an enhancer located exactly at the center of a TAD. If this TAD contains three genes (one near each boundary and one near the center), which genes would you expect the enhancer to regulate? How would your prediction change if the enhancer were located near one of the TAD boundaries instead?

Center-located enhancer: All three genes within the TAD could potentially be regulated because TAD boundaries constrain contacts between domains, not within them. The central enhancer could contact all three genes through chromatin looping, though the nearest gene (at the center) would likely show strongest contact frequency due to proximity.

Boundary-located enhancer: This is the more interesting case. TAD boundaries are not perfect insulators. An enhancer near a boundary might: (1) preferentially contact genes within its own TAD, (2) occasionally “leak” to contact genes in the adjacent TAD (TAD boundaries are statistical rather than absolute), or (3) if positioned at an anchor point, be involved in structural loops rather than regulatory contacts.

The key insight is that TADs create statistical preferences for contact: enhancers within a TAD are more likely to contact genes in the same TAD, but boundary effects create gradients rather than hard walls.

21.1.3 Loop Extrusion Mechanism

Predict Before You Look

Before reading: CTCF proteins bind to DNA in a specific orientation. What do you predict happens when two CTCF sites face toward each other (convergent) versus away from each other (divergent)? Which orientation would you expect to form stable chromatin loops?

The molecular basis of TAD formation is now well understood through the loop extrusion model. Think of cohesin as a ring sliding along a rope: once threaded, it moves along the rope, gathering slack into a growing loop. The ring slides freely until it hits a knot tied in a specific direction; only knots facing toward the ring block its progress, while knots facing away let it pass. The cohesin protein complex loads onto chromatin and extrudes DNA bidirectionally, progressively enlarging the extruded loop until it encounters an obstacle. The key obstacle is CTCF protein bound to DNA in a specific orientation. When cohesin encounters CTCF sites oriented toward each other (convergent orientation), extrusion halts and a stable loop forms with the convergent CTCF sites at the loop anchors (Sanborn et al. 2015; Rao et al. 2014). This model explains several key observations: TAD boundaries are enriched for CTCF binding sites; CTCF motif orientation predicts which sites will anchor loops (convergent pairs form loops while divergent pairs do not); and acute degradation of cohesin eliminates TADs within hours while leaving compartments intact (Rao et al. 2017). The distinction between compartment and TAD formation mechanisms has important implications for prediction. Models that capture CTCF binding and orientation can predict TAD boundaries; predicting compartments requires learning different sequence features associated with chromatin state.

Key Insight

The orientation rule for CTCF is a powerful predictor: convergent CTCF pairs (pointing toward each other: \(\rightarrow \leftarrow\)) form stable loop anchors, while divergent pairs (\(\leftarrow \rightarrow\)) and tandem pairs (\(\rightarrow \rightarrow\)) do not. This simple rule, which arises from the loop extrusion mechanism, allows computational models to predict many TAD boundaries directly from sequence.

21.1.4 Fine-Scale Chromatin Loops

At the finest scale, chromatin forms specific loops between defined loci. Enhancer-promoter loops bring distal regulatory elements into physical proximity with their target genes, while structural loops between convergent CTCF sites establish the TAD framework. Most enhancer-promoter contacts span less than 200 kilobases, but some extend over a megabase (Rao et al. 2014). Detecting these fine-scale contacts requires high-resolution data; the Micro-C method uses micrococcal nuclease digestion to achieve nucleosome-level resolution, revealing contact patterns invisible in standard Hi-C (Hsieh et al. 2020). The functional significance of individual loops remains debated. Some loops appear essential for gene activation; others may be structural features without direct regulatory consequences.

Chromosome territories: gene-rich interior, gene-poor periphery

A/B compartments form checkerboard in Hi-C

TADs appear as triangular domains

Chromatin loops connect enhancers to promoters
Figure 21.2: Hierarchical organization of the 3D genome. (A) Chromosome territories: each chromosome occupies a distinct nuclear volume with gene-rich chromosomes toward the interior. (B) A/B compartments: active (A) and inactive (B) chromatin form a checkerboard pattern visible at megabase scale. (C) TADs: topologically associating domains appear as triangular enrichments in Hi-C with a median size of ~800 kb. (D) Chromatin loops: focal Hi-C enrichments represent enhancer-promoter contacts and CTCF-anchored structural loops.

Cohesin loads onto chromatin as a ring

Bidirectional extrusion enlarges the loop

Convergent CTCF sites halt extrusion

Only convergent CTCF pairs form stable loops
Figure 21.3: The loop extrusion mechanism. (A) Cohesin loads onto chromatin as a ring complex. (B) Bidirectional extrusion progressively enlarges the DNA loop. (C) Extrusion halts when cohesin encounters convergent CTCF sites, forming stable loop anchors. (D) The orientation rule: only convergent CTCF pairs (→←) form stable loops; divergent (←→) and tandem (→→) orientations do not halt extrusion.

21.2 Measuring the 3D Genome

Predicting 3D genome structure requires training data: measurements of which sequences contact which other sequences in real cells. Chromosome conformation capture methods provide these measurements through a common biochemical principle, though the technologies vary in resolution, throughput, and the aspects of 3D organization they reveal.

21.2.1 Hi-C and Contact Matrices

Technical Detail

This section describes the biochemistry and normalization of Hi-C data. Understanding these technical details helps explain why training data resolution varies and why certain biases must be corrected. Readers primarily interested in computational prediction can focus on the key point: Hi-C produces a symmetric contact matrix where values represent how frequently two genomic regions were spatially close in the cell population.

Cells are crosslinked with formaldehyde to freeze chromatin contacts in place; DNA is digested with restriction enzymes; free DNA ends are ligated, preferentially joining fragments that were spatially proximate; and the ligated junctions are identified through sequencing. The frequency of junction reads between two genomic regions reflects how often those regions were in physical contact across the cell population.

Hi-C extends this principle genome-wide by incorporating biotinylated nucleotides at ligation junctions, enabling purification of chimeric fragments from the entire genome (Lieberman-Aiden et al. 2009). The output is a contact matrix—similar to a table showing how often people from different neighborhoods meet each other in a city. Just as residents of the same neighborhood run into each other frequently while encounters between distant neighborhoods are rare, genomic regions close together contact often while distant regions rarely meet. Rows and columns represent genomic bins (typically 1 to 50 kilobases depending on sequencing depth) and values represent contact frequencies between bin pairs. Resolution depends directly on sequencing depth: achieving 1 kilobase resolution requires billions of reads, while 10 kilobase resolution requires hundreds of millions. Raw contact frequencies require extensive normalization to correct for biases from GC content, restriction site density, and mappability. The ICE (iterative correction and eigenvector decomposition) method and related approaches remove these technical artifacts while preserving biological signal. ICE works by assuming that all genomic regions should have roughly equal total contact frequency (visibility), then iteratively adjusting row and column sums to achieve this balance. The assumption that visibility differences are technical rather than biological is imperfect but works well in practice because the dominant technical biases (GC content affecting PCR amplification, restriction site density affecting fragmentation) would otherwise dwarf genuine biological variation. The training strategies that enable models to learn from these normalized contact matrices follow the multi-task principles introduced in Section 8.6.

The contact matrix encodes all levels of chromatin organization. Compartments appear as the checkerboard pattern when viewing megabase-scale interactions; TADs appear as triangular domains of enriched contacts along the diagonal; and loops appear as focal enrichments at specific off-diagonal positions. The matrix is dominated by the polymer effect: sequences that are close in linear distance contact each other frequently regardless of specific 3D structure, creating strong signal along the diagonal that can obscure biologically meaningful contacts at greater distances. Why does genomic distance dominate contact frequency? Chromatin behaves as a polymer: random thermal fluctuations bring nearby sequences into contact simply because they are tethered to the same chain. Two loci 10 kb apart will contact each other frequently by chance, while loci 10 Mb apart rarely meet through random motion. This distance decay follows a power law (contact frequency ~ distance-1), and deviations from this baseline reveal the biologically interesting contacts: TAD boundaries where contact drops faster than expected, and loops where contact is enriched above the polymer baseline.

21.2.2 Resolution and Data Resources

Predict Before You Look

Recall from Section 2.4 how different sequencing technologies involve tradeoffs between throughput, resolution, and cost. Before examining the table below, predict: which 3D genome technology would you expect to have the highest resolution but lowest throughput? Which would have the opposite characteristics?

Beyond standard Hi-C, several technologies address specific limitations:

Table 21.2: Comparison of 3D genome measurement technologies. The choice of technology involves tradeoffs between resolution, cost, and genome coverage.
Technology Resolution Throughput Key Advantage Key Limitation
Hi-C 1–50 kb High Genome-wide, well-established Limited by restriction sites
Micro-C ~100 bp Moderate Nucleosome-level resolution Higher cost, fewer datasets
Single-cell Hi-C Variable Low Cell-to-cell variation Extremely sparse matrices
DNA FISH Single locus Low Direct visualization Low throughput
Capture Hi-C 1–5 kb Moderate High resolution at targets Limited to predetermined loci

Micro-C achieves nucleosome-level resolution by using micrococcal nuclease instead of restriction enzymes, revealing fine-scale contact patterns invisible at standard Hi-C resolution. Single-cell Hi-C measures contacts in individual cells, revealing that any two loci contact each other in only a small fraction of cells, but the resulting matrices are extremely sparse (most possible contacts are unmeasured in any single cell). Why is single-cell Hi-C so sparse? The math is humbling: each chromosome makes at most one contact per restriction fragment per cell (a fragment cannot be ligated to two partners simultaneously). With millions of possible genomic bin pairs but only thousands of ligation events captured per cell, the sampling is fundamentally incomplete. The sparsity is not a technical limitation that better sequencing will solve; it reflects the physical constraint that each genomic region can only contact one partner at a time in any given cell. Imaging methods such as DNA FISH directly visualize genomic loci in the nucleus, providing ground truth for computational predictions but at much lower throughput than sequencing-based approaches.

Training data for 3D prediction models comes primarily from a small number of well-characterized cell lines. The lymphoblastoid cell line GM12878 and the leukemia cell line K562 have deep Hi-C coverage across multiple laboratories, making them the default training sets for most models. Primary tissues and rare cell types have sparse coverage, creating a significant gap between where models are trained and where clinical applications require predictions. The 4D Nucleome Data Portal and ENCODE provide the most comprehensive repositories of 3D genome data, though coverage remains heavily biased toward common cell lines and human samples. This data landscape parallels the challenges discussed for functional genomics data more broadly (Chapter 2).

Knowledge Check

A researcher wants to study enhancer-promoter contacts at a specific gene locus implicated in a rare disease. The target region spans 500 kb. Given the technologies described above, which approach would you recommend, and why? What would be the key limitation of your recommended approach?

Capture Hi-C would be the best choice: it provides high resolution (1-5 kb) at the specific target locus, enabling identification of enhancer-promoter contacts within the 500 kb region without the cost of genome-wide sequencing. The key limitation is that it only captures contacts within the predetermined region, potentially missing long-range interactions with sequences outside the targeted 500 kb window.

21.3 Predicting 3D Structure from Sequence

Sequence-based prediction of 3D genome structure asks whether DNA sequence alone contains sufficient information to predict chromatin contacts. The success of models like Akita, Orca, and C.Origami demonstrates that sequence encodes substantial 3D information, particularly for TAD boundaries and CTCF-anchored loops. These models share a common challenge: predicting a two-dimensional contact matrix from a one-dimensional sequence input.

21.3.1 Akita and Dilated Convolutions

Retrieval Practice

Recall from Chapter 6 how dilated convolutions expand receptive fields. Why is this architectural choice particularly important for 3D genome prediction, where CTCF binding sites may be separated by hundreds of kilobases? What would happen if we used standard convolutions instead?

Akita, introduced by Fudenberg et al. in 2020, established the paradigm for sequence-to-contact prediction (Fudenberg, Kelley, and Pollard 2020). The model takes approximately one megabase of DNA sequence as input and predicts Hi-C contact frequencies at 2 kilobase resolution. The architecture uses dilated convolutions to expand the receptive field without proportionally increasing parameters (an approach discussed in detail in Section 6.5.1), enabling the model to integrate information across the full input window. Dilated convolutions are essential here because TAD boundaries and loop anchors depend on sequence features (primarily CTCF motifs) that may be separated by hundreds of kilobases, far beyond what standard convolutions could capture without prohibitive parameter counts.

The output is symmetric (contacts between positions i and j equal contacts between j and i), which the architecture enforces through appropriate pooling operations. This symmetry constraint is not merely a convenience but reflects the physical reality of chromatin contacts: if region A touches region B, then by definition B touches A. Enforcing this constraint in the architecture prevents the model from learning spurious asymmetric predictions that would violate physics. Akita achieves correlation coefficients of 0.6 to 0.8 between predicted and observed contact maps in held-out genomic regions, successfully identifying TAD boundaries and major loop anchors.

21.3.2 Orca and Multiscale Prediction

Orca extends sequence-based prediction to multiple resolutions simultaneously (Zhou 2022). Rather than predicting a single-resolution contact map, Orca generates predictions at 4, 8, 16, 32, 64, 128, and 256 kilobase resolutions, capturing both fine-scale loops and large-scale compartment structure. The multiscale approach addresses a fundamental challenge: compartments span megabases while loops span kilobases, and no single resolution optimally captures both. A single-resolution model forced to compromise would either miss fine-scale loop contacts (at coarse resolution) or fail to capture compartment patterns that emerge from aggregating many weak interactions (at fine resolution).

Predict Before You Look

Think back to Enformer from Chapter 17, which uses dilated convolutions to expand its receptive field. How might Orca’s multiscale approach differ from simply using more extreme dilation rates? What architectural advantage does explicit multi-resolution prediction provide?

Orca’s architecture processes sequence through parallel pathways tuned to different scales, then combines predictions into a coherent multiscale representation. The parallel pathways use different pooling and dilation strategies, effectively asking “what does this sequence predict at the 4kb scale?” independently from “what does it predict at the 256kb scale?” This design enables prediction of structural variants’ effects across organizational levels, from disrupted loops to altered compartment boundaries.

21.3.3 C.Origami and Cross-Cell-Type Transfer

C.Origami addresses the cell-type specificity problem (Tan et al. 2023). While TAD boundaries are largely conserved across cell types, finer-scale contacts vary substantially. C.Origami incorporates CTCF ChIP-seq data alongside sequence, enabling the model to learn how cell-type-specific CTCF binding patterns shape cell-type-specific contact maps. This design enables transfer learning: train on cell types with both Hi-C and CTCF data, then predict contacts in new cell types using only CTCF ChIP-seq. The approach substantially expands the range of cell types where 3D predictions are possible, since CTCF ChIP-seq is available for many more cell types than deep Hi-C. This transfer strategy echoes the broader transfer learning principles discussed in Chapter 9.

Predict Before You Look

Before examining the comparison table, consider the tradeoffs each model makes. Which model would you expect to have the best cross-cell-type performance? Which would be most suitable for predicting the effects of a novel structural variant in a cell type with no available Hi-C data?

The following table compares these three foundational 3D prediction models:

Table 21.3: Comparison of sequence-based 3D structure prediction models. Each addresses different aspects of the prediction challenge.
Feature Akita Orca C.Origami
Input Sequence only Sequence only Sequence + CTCF ChIP-seq
Context length ~1 Mb ~1 Mb ~1 Mb
Output resolution 2 kb 4–256 kb (multiscale) Variable
Architecture Dilated CNN Multiscale CNN CNN with auxiliary input
Cell-type transfer Limited Limited Enabled via CTCF data
Key strength Established paradigm Multiscale prediction Cross-cell-type transfer
Key limitation Single cell type Requires Hi-C training Requires CTCF ChIP-seq
Checkpoint: 3D Prediction Approaches

Before proceeding, ensure you can explain:

  1. The tradeoff between sequence-only models (Akita) and models requiring auxiliary data (C.Origami)
  2. Why multiscale prediction (Orca) captures different structural features than fixed-resolution models
  3. When cross-cell-type transfer is possible and what data it requires

If these distinctions are unclear, re-read the model descriptions and table above.

Stop and Think

Consider the tradeoff between Akita’s sequence-only approach and C.Origami’s requirement for CTCF ChIP-seq data. In what scenarios would each approach be preferred? Think about (1) predicting structural variant effects, (2) predicting contacts in a novel cell type, and (3) understanding what sequence features determine 3D structure.

21.3.4 Learned Sequence Determinants

Retrieval Practice

Before reading about what these 3D prediction models learn, recall the loop extrusion mechanism from earlier in this chapter. What sequence feature would you expect to be the strongest predictor of chromatin loop formation? Why would the orientation of this feature matter?

Interpretability analysis reveals what these models learn about sequence determinants of 3D structure. Attribution methods (discussed more fully in Chapter 25) consistently identify CTCF motifs as the strongest predictors of contact patterns, with convergent CTCF pairs (motifs oriented toward each other) most strongly associated with loop anchors. Transcription start sites contribute to boundary predictions, consistent with the observation that active promoters often coincide with domain edges. GC content correlates with compartment identity (GC-rich regions tend toward A compartment), and repetitive element composition shows systematic associations (LINE elements with B compartment; Alu elements with A compartment). The orientation rule for CTCF emerges naturally from training: models learn that CTCF motif orientation, not just presence, predicts which sites will anchor loops. This learned relationship matches the mechanistic understanding from the loop extrusion model, providing validation that models capture biologically meaningful features.

Despite these advances, significant limitations remain. Resolution is constrained by training data; predicting nucleosome-level contacts requires Micro-C training data that exists for few cell types. Unlike Hi-C, which is limited by restriction enzyme cutting sites (typically every few kilobases), Micro-C uses micrococcal nuclease to digest chromatin between nucleosomes, achieving approximately 100 base pair resolution, roughly 10 to 50 times finer than standard Hi-C. However, this improved resolution comes at a cost: Micro-C requires substantially deeper sequencing to achieve comparable genome-wide coverage, and far fewer cell types have been profiled at the depth needed for training contact prediction models. The single-cell variation problem is fundamental: models trained on bulk Hi-C predict population averages, but gene regulation may depend on the stochastic 3D configurations in individual cells. Causality cannot be established from prediction alone; a model may correctly predict that two regions contact each other without revealing whether that contact causes any functional consequence. Generalization to cell types distant from training data remains uncertain, and the computational cost of processing megabase sequences limits practical applications for genome-wide analysis.

Akita uses dilated convolutions for Hi-C prediction

Orca provides multi-scale predictions

C.Origami incorporates CTCF for cell-type transfer

Predictions capture TAD boundaries and loops
Figure 21.4: Sequence-based 3D structure prediction models. (A) Akita architecture: dilated convolutions expand the receptive field to process ~1 Mb sequences and predict Hi-C at 2 kb resolution. (B) Orca multi-scale approach: parallel pathways predict contacts from 4 kb to 256 kb resolution simultaneously. (C) C.Origami incorporates CTCF ChIP-seq alongside sequence, enabling cross-cell-type transfer. (D) Prediction versus ground truth comparison showing that sequence-based models capture TAD boundaries and major loop anchors with correlations of 0.6-0.8.

21.4 3D Structure and Gene Regulation

The ultimate purpose of 3D genome prediction is understanding gene regulation. Contact maps matter because they reveal which enhancers can reach which genes. Integrating 3D structure with expression prediction addresses limitations that purely one-dimensional models cannot overcome.

21.4.1 Beyond One-Dimensional Models

Enformer (Section 17.2) predicts gene expression from sequence within a 200 kilobase window, sufficient to capture many enhancer-promoter relationships but fundamentally limited by its treatment of the genome as a one-dimensional string. This representation cannot distinguish an enhancer that loops to a distant gene from one blocked by a TAD boundary, nor can it explain cell-type-specific contacts that activate different genes from the same enhancer in different contexts. The 3D genome provides this missing context: physical proximity through chromatin loops determines which regulatory elements can communicate.

Consider an enhancer located 300 kilobases from two genes, one upstream and one downstream. Linear models would predict similar regulatory influence on both genes based on comparable distances. But if a TAD boundary lies between the enhancer and the upstream gene, 3D structure predicts that only the downstream gene receives regulatory input. The boundary insulates the upstream gene from enhancer activity regardless of linear proximity. This insulation function explains why TAD boundaries show such strong evolutionary conservation: disrupting boundaries allows regulatory crosstalk that can dysregulate gene expression with pathogenic consequences.

Key Insight

Linear distance is not regulatory distance. An enhancer 500 kb away within the same TAD may have stronger regulatory influence than an enhancer 50 kb away across a TAD boundary. This principle explains why structural variants can be pathogenic even when they do not disrupt any coding sequence: they rewire the regulatory topology.

21.4.2 Structural Variant Interpretation

The clinical significance is clearest in structural variant interpretation. Deletions that remove TAD boundaries cause enhancer hijacking, where regulatory elements gain access to genes in adjacent domains. The EPHA4 locus provides the canonical example: limb enhancers normally activate EPHA4 expression in developing limbs. When deletions remove the TAD boundary separating EPHA4 from the adjacent WNT6/PAX3 domain, these enhancers ectopically activate WNT6, causing limb malformations including brachydactyly and polydactyly (Lupiáñez et al. 2015). Different deletion sizes produce different phenotypes depending on which boundaries are removed and which new enhancer-gene contacts form. Similar mechanisms operate in cancer, where structural variants create novel enhancer-oncogene contacts that drive tumor growth. The diagnostic challenge is substantial: predicting pathogenicity of structural variants requires understanding which 3D contacts will be disrupted and what new contacts will form, predictions that sequence-only models cannot provide. This challenge intersects with the variant prioritization pipelines discussed in Chapter 29, where 3D genome effects represent a systematic blind spot in current foundation model approaches to variant effect prediction (Chapter 18).

Practical Guidance: Structural Variant Analysis

When analyzing structural variants for potential pathogenicity:

  1. Identify nearby TAD boundaries using available Hi-C data or boundary predictions from Akita/Orca
  2. Check for boundary disruption: Does the structural variant delete, invert, or translocate a boundary?
  3. Inventory regulatory elements: What enhancers exist in the affected region? What genes lie in adjacent TADs?
  4. Consider tissue specificity: The consequence depends on which cell types express the relevant enhancers and target genes
  5. Compare to known cases: Databases of pathogenic structural variants with 3D mechanism provide precedent

Structural variants that disrupt boundaries are more likely to be pathogenic than those preserving domain structure, even if the latter remove more sequence.

Integrating 3D predictions with expression models remains technically challenging. Hybrid approaches use predicted contacts to weight enhancer contributions: rather than treating all enhancers within a window equally, weights reflect predicted contact frequency with the target promoter. This activity-by-contact framework (expression proportional to the sum of enhancer activities weighted by contact frequencies) captures some of the regulatory logic that 1D models miss. Graph-based representations (Chapter 22) can encode genes and enhancers as nodes with contacts as edges, enabling graph neural networks to reason about regulatory relationships in 3D space. Attribution methods for understanding which contacts drive expression predictions are examined in Section 25.1. End-to-end training of combined 3D and expression models remains difficult; most current approaches train the components separately and combine predictions post hoc.

21.4.3 Causality and Permissive Architecture

The causality question complicates interpretation. Do enhancer-promoter contacts cause gene activation, or does gene activation cause contacts? Transcription itself can influence chromatin organization: active transcription may stabilize enhancer-promoter contacts that would otherwise be transient. Perturbation experiments provide cleaner causal tests than correlational analysis. Acute degradation of cohesin eliminates TADs within hours, yet most genes show minimal expression changes, suggesting that many TAD structures are permissive rather than deterministic for gene regulation. CRISPR-based deletion of specific TAD boundaries similarly produces more modest effects than the structural disruption would suggest. The emerging view is nuanced: 3D structure constrains which enhancer-promoter interactions are possible, but whether those interactions occur depends on additional factors including transcription factor availability and chromatin state. This distinction between correlation and causation echoes the confounding challenges discussed in Chapter 13 and the causal inference principles explored in Chapter 26.

Stop and Think

A recent paper reports that deleting a TAD boundary experimentally results in new chromatin contacts between an enhancer and a previously insulated gene, but the gene’s expression does not change. How would you interpret this result? What additional experiments would help distinguish between (a) the contact is non-functional, (b) the contact is functional but opposed by other regulatory mechanisms, and (c) the contact requires additional factors not present in this experimental system?

21.5 Spatial Transcriptomics

Single-cell RNA sequencing (Chapter 20) reveals cellular heterogeneity but discards spatial information: we learn which genes each cell expresses but not where that cell sits within the tissue. For understanding tumor microenvironments, developmental gradients, or tissue architecture, spatial context is essential. A T cell adjacent to a tumor cell experiences a different microenvironment than one in the surrounding stroma, and this spatial context shapes gene expression programs in ways that dissociated single-cell data cannot capture.

21.5.1 Measurement Technologies

Spatial transcriptomics technologies fall into two broad categories with complementary strengths. Spot-based methods like Visium (10x Genomics) capture polyadenylated RNA at arrayed positions on a slide, providing transcriptome-wide measurement at approximately 55 micrometer resolution (typically 1 to 10 cells per spot). These methods offer comprehensive gene coverage but limited spatial resolution. Imaging-based methods like MERFISH use sequential rounds of fluorescent hybridization to identify RNA molecules in situ, achieving subcellular resolution but limited to pre-selected gene panels (hundreds to thousands of genes rather than transcriptome-wide). Newer technologies like Stereo-seq achieve near-cellular resolution with transcriptome-wide coverage through spatial barcoding, though they remain less validated than established methods.

Predict Before You Look

Consider the fundamental tradeoff between spatial resolution and gene coverage. Why might this tradeoff exist at a technical level? Which approach would you choose for (1) discovering novel spatial patterns in a tissue, versus (2) mapping known cell-cell interactions at high resolution?

Table 21.4: Comparison of spatial transcriptomics platforms. The tradeoff between spatial resolution and gene coverage drives technology selection.
Approach Resolution Gene Coverage Example Technologies Best For
Spot-based ~55 um (multi-cell) Transcriptome-wide Visium, Slide-seq Discovery, whole-transcriptome
Imaging-based Subcellular 100–10,000 genes MERFISH, Xenium Targeted, single-cell resolution
Next-generation Near-cellular Transcriptome-wide Stereo-seq, Seq-Scope Emerging applications

21.5.2 Computational Challenges

Computational challenges in spatial transcriptomics mirror and extend those in single-cell analysis (Chapter 20). Spot deconvolution addresses the multiple-cells-per-spot problem in Visium data: inferring the cell type composition within each spot by comparing spot expression profiles to reference single-cell atlases. Imputation methods predict expression of genes not measured in imaging-based assays, leveraging correlations learned from reference datasets. Integration aligns spatial data with single-cell references, mapping reference cell types onto spatial coordinates. Domain correction handles batch effects that manifest in spatial patterns as well as expression levels. The sparsity problem is even more severe than in standard single-cell RNA sequencing; gene detection rates in spatial methods often fall below 10 percent . The missing modality strategies developed for multi-omics integration (Section 23.6) become essential when spatial methods fail to detect genes that single-cell RNA-seq measures reliably.

21.5.3 Spatial Foundation Models

Spatial foundation models remain much less mature than sequence-based models (Chapter 15, Chapter 16). The fundamental challenge is the lack of an equivalent to evolutionary pretraining: DNA and protein models learn from billions of years of evolutionary experiments encoded in sequence databases, but no comparable natural augmentation exists for spatial organization. Current approaches include graph neural networks that encode spatial relationships as edges between neighboring cells or spots, transformer architectures that treat spatial positions as tokens with positional encodings derived from coordinates, and generative models that learn spatial patterns from atlases of reference tissues. Models like Nicheformer apply transformer architectures to spatial niches (local cellular neighborhoods), learning representations that capture cell-cell communication patterns and tissue microenvironment signatures . SpaGCN uses graph convolutional networks with spatial graphs, propagating information between spatially adjacent regions to identify spatial domains with coherent expression patterns .

Other approaches address different aspects of the spatial modeling problem. CellPLM pretrains on millions of spatial transcriptomics cells, learning representations that transfer across tissue types and experimental platforms . STACI combines spatial coordinates with morphological features from histology images, enabling joint reasoning about molecular and visual tissue properties . GraphST uses graph attention networks to propagate expression signals across spatial neighborhoods while preserving local heterogeneity . HEIST (Madhu et al. 2025) employs a hierarchical graph transformer architecture that models tissue organization at multiple scales, capturing both local cell-cell communication patterns and broader tissue structure; whether such multi-scale spatial patterns improve downstream predictions remains an active area of validation. These methods remain early in development compared to sequence foundation models; no spatial equivalent of DNABERT or ESM-2 has achieved broad adoption, and benchmark comparisons across methods remain limited by the diversity of spatial platforms and tissue types.

Visium spots capture 1-10 cells each

Imaging methods achieve single-cell resolution

Deconvolution infers cell type composition

Graph neural networks model spatial context
Figure 21.5: Spatial transcriptomics technologies and computational approaches. (A) Spot-based methods (Visium): transcriptome-wide measurement at ~55 μm resolution, capturing 1-10 cells per spot. (B) Imaging-based methods (MERFISH, Xenium): subcellular resolution but limited to pre-selected gene panels. (C) The deconvolution challenge: inferring cell type composition within each spot using single-cell reference atlases. (D) Spatial foundation models: graph neural networks over spatial tissue graphs enable modeling of cell-cell communication and tissue microenvironment.

The clinical applications motivating spatial foundation model development center on tumor microenvironment characterization. The spatial organization of immune cells relative to tumor cells predicts treatment response: tumors with immune cells infiltrating the tumor core respond better to immunotherapy than those with immune exclusion at the tumor periphery . Spatial models aim to learn these prognostic patterns from training data, enabling prediction of treatment response from spatial organization alone. Similar applications exist in developmental biology (understanding morphogen gradients and cell fate decisions), neuroscience (mapping brain region organization), and pathology (characterizing disease architecture in tissue sections).

21.6 Limitations and Open Questions

Current 3D genome and spatial models face limitations that constrain their utility for clinical and research applications. Resolution remains a fundamental constraint: most Hi-C prediction models operate at 2 to 10 kilobase resolution, while functionally relevant enhancer-promoter contacts involve specific sequences within those bins. Predicting which specific kilobases within a TAD contact each other requires resolution that exceeds current training data in most cell types. The resolution needed for accurate prediction may exceed the resolution achievable from bulk Hi-C, creating a data ceiling that computational methods cannot overcome.

The population averaging problem is more fundamental than a mere technical limitation. Bulk Hi-C measurements average over millions of cells, each with a different 3D configuration. Any two loci contact each other in only a minority of cells at any given time, yet the averaged contact frequency appears as a single value in the training data. Single-cell Hi-C reveals this heterogeneity but produces extremely sparse data (most possible contacts unmeasured in each cell). Models trained on population averages cannot predict single-cell behavior, yet gene regulation may depend on the stochastic dynamics of contact formation in individual cells. Whether the population average or the single-cell distribution matters more for predicting gene expression remains unclear.

Causality represents the deepest conceptual challenge. Predicting that two regions contact each other does not establish that the contact causes any biological consequence. Many TAD disruptions produce minimal expression changes; many enhancer-promoter contacts may be bystanders rather than drivers of transcription. The loop extrusion machinery that creates TADs operates continuously, but the transcriptional machinery that reads out enhancer-promoter communication operates on different timescales and with different requirements. Computational predictions of 3D structure are correlational; establishing which predicted contacts matter functionally requires experimental validation that computational methods cannot replace.

For clinical applications, the sparse training data creates systematic blind spots. Models trained on GM12878 and K562 may not transfer to the primary cells, developmental stages, or disease states where predictions matter most. A structural variant affecting 3D organization in neural progenitor cells cannot be reliably interpreted using models trained only on lymphoblastoid cells. The cell types most relevant for clinical interpretation are often those with the least 3D characterization data available. This challenge parallels the transferability concerns discussed throughout Chapter 11 and Chapter 13.

Knowledge Check

Summarize three distinct limitations of current 3D genome prediction models. For each limitation, identify whether it is primarily (a) a data limitation that more sequencing could address, (b) a fundamental biological challenge, or (c) a computational/algorithmic limitation. How do these limitations affect the clinical utility of 3D structure predictions?

Three key limitations:

  1. Resolution gap (data): models predict at 2-10 kb but enhancer-promoter contacts require finer resolution; more sequencing could help but may hit biological ceilings.

  2. Population averaging (biological): bulk Hi-C averages over millions of cells with different configurations; single-cell approaches are fundamentally sparse due to physical constraints.

  3. Causality (biological): predicting contacts does not establish functional consequences; many contacts are permissive rather than deterministic.

These severely limit clinical utility because disease-relevant cells lack training data and predicted contacts may not indicate functional effects.

21.7 Structure as Context, Not Cause

The genome’s three-dimensional organization provides context that one-dimensional sequence models cannot capture. Enhancer-promoter contacts explain regulatory relationships spanning hundreds of kilobases; TAD boundaries constrain which elements can interact; tissue architecture determines the cellular neighborhoods where gene expression programs execute. Models like Akita, Orca, and C.Origami demonstrate that sequence contains substantial information about chromatin folding, predicting contact maps from DNA sequence with accuracy sufficient to identify structural variants and disease-associated changes.

Yet the functional role of 3D structure remains more modest than early enthusiasm implied. Experimental perturbation studies show that TAD boundary disruption often has limited expression consequences . Many chromatin contacts appear permissive rather than instructive: they establish the possibility of regulatory communication without determining whether that communication occurs. A predicted enhancer-promoter contact indicates that interaction could happen, not that it does happen or that it matters when it does. The 3D genome may constrain the regulatory landscape without specifying regulatory outcomes.

Key Insight

3D structure is permissive, not deterministic. Think of TADs as enabling rather than commanding: they create the possibility of enhancer-promoter communication, but the actual communication requires transcription factors, chromatin accessibility, and other regulatory inputs. This is why disrupting a TAD boundary can cause disease (by enabling pathogenic new contacts) without boundary integrity being necessary for normal gene expression in most cases.

This distinction shapes how 3D structure should be integrated with other modalities. Chromatin contacts become edges in gene regulatory networks (Chapter 22), providing structural priors for graph-based reasoning. Spatial expression patterns integrate with multi-omics approaches (Chapter 23), adding tissue architecture alongside genomics and transcriptomics. For interpretability (Chapter 25), 3D structure offers mechanistic hypotheses that require experimental validation. Whether a predicted regulatory effect operates through chromatin proximity, or whether proximity merely correlates with regulation through shared causes, remains a question that computational models can motivate but not answer. The integration of 3D information into genomic AI proceeds with appropriate uncertainty about what that information contributes.

Test Yourself

Before reviewing the summary, test your recall:

  1. What is the “orientation rule” for CTCF binding sites, and why does it determine which sites will anchor chromatin loops?
  2. Explain why “linear distance is not regulatory distance” in the context of enhancer-gene regulation.
  3. How does the loop extrusion mechanism create TAD boundaries?
  4. Why is 3D genome structure described as “permissive rather than deterministic” for gene regulation?
  5. What is the population averaging problem in Hi-C data, and why does it limit our understanding of gene regulation at the single-cell level?
  1. CTCF orientation rule: Convergent CTCF pairs (→←) form stable loop anchors because the loop extrusion mechanism halts when cohesin encounters CTCF sites oriented toward each other. Divergent (←→) and tandem (→→) orientations do not block extrusion, so no stable loop forms.

  2. Linear vs. regulatory distance: An enhancer 500 kb away within the same TAD may regulate a gene more strongly than an enhancer 50 kb away across a TAD boundary. The 3D folding determines regulatory proximity; boundaries insulate genes from enhancers despite short linear distances, while loops bring distant elements into contact.

  3. Loop extrusion and TAD boundaries: Cohesin loads onto chromatin and extrudes DNA bidirectionally, enlarging a loop until encountering convergent CTCF sites. These CTCF-anchored loops define TAD boundaries. Multiple adjacent loops create the triangular TAD structure visible in Hi-C.

  4. Permissive vs. deterministic: 3D structure establishes which enhancer-promoter interactions can occur, but whether they do occur depends on additional factors (transcription factors, chromatin accessibility). Many TAD disruptions produce minimal expression changes, showing that contacts enable but do not command gene regulation.

  5. Population averaging problem: Bulk Hi-C averages over millions of cells, each with a different 3D configuration. Any two loci contact each other in only 5-15% of cells at any time, but Hi-C reports the average contact frequency. This obscures single-cell stochastic dynamics that may be critical for gene regulation but cannot be recovered from bulk data.

Chapter Summary

Key Concepts Covered:

  • Chromatin organization hierarchy: Chromosome territories, A/B compartments, TADs, and fine-scale loops represent nested organizational levels with distinct mechanisms
  • Loop extrusion model: Cohesin extrudes DNA until blocked by convergent CTCF sites, explaining TAD boundary formation
  • Hi-C and contact matrices: Chromosome conformation capture methods measure 3D contacts; resolution depends on sequencing depth
  • Sequence-based prediction: Akita, Orca, and C.Origami predict contact maps from sequence, achieving ~0.6-0.8 correlation with experimental data
  • Structural variant interpretation: Boundary disruption causes enhancer hijacking with pathogenic consequences
  • Spatial transcriptomics: Extends single-cell analysis to include tissue location, enabling microenvironment characterization

Core Takeaways:

  1. The 3D genome provides regulatory context that 1D models cannot capture; linear distance is not regulatory distance
  2. CTCF motif orientation is a powerful predictor of loop anchors: convergent pairs form loops, divergent pairs do not
  3. Sequence contains substantial 3D information, but prediction accuracy varies by organizational level (best for TAD boundaries, worst for compartments)
  4. 3D contacts are permissive rather than deterministic; contact predicts that regulation could occur, not that it does
  5. Clinical application to structural variants is limited by training data bias toward a few cell lines

Connections to Other Chapters: