3D reconstruction from Hi-C data¶
U-Chrom provides several 3D reconstruction algorithms:
Module |
Best for |
Notes |
|---|---|---|
|
Single-cell Hi-C / Dip-C |
Molecular-dynamics, Taichi GPU |
|
Single-cell, energy minimisation |
Taichi autodiff |
|
Bulk Hi-C |
SMACOF MDS, PyTorch GPU |
This notebook walks through a short single-cell reconstruction using Nuc Dynamics on the example data shipped with the repo.
Locate the example data¶
The notebook looks for example-data/cell1.pairs in the repo; we walk up the directory tree to find the repo root so this runs from any cwd.
from pathlib import Path
def find_repo_root(start: Path = Path.cwd()) -> Path:
for p in [start, *start.parents]:
if (p / 'example-data' / 'cell1.pairs').exists():
return p
raise FileNotFoundError('Could not locate example-data/cell1.pairs')
repo = find_repo_root()
pairs = repo / 'example-data' / 'cell1.pairs'
out_path = repo / 'tutorials' / '_out_tutorial3.h5cd'
print('pairs:', pairs)
print('out :', out_path)
Run Nuc Dynamics¶
The CLI is the simplest way; the main function is also importable.
# From the shell (uncomment to run):
# !python -m uchrom.recon.sc.nucdyn example-data/cell1.pairs out.h5cd \
# --arch=cpu --dyns=30 --size_steps='[2, 0.4]'
from uchrom.recon.sc.nucdyn import main as nucdyn_main
nucdyn_main(
str(pairs),
str(out_path),
arch='cpu', # 'cuda' if you have an NVIDIA GPU
dyns=30, # fewer steps for the tutorial
size_steps=[2, 0.4], # only two resolution stages
)
Load the reconstructed structure¶
The .h5cd output carries a cell_id derived from the filename and one trace per chromosome.
from uchrom import ChromData
cd = ChromData.read(str(out_path))
cd
print('chroms :', cd.chroms)
print('n_spots :', cd.n_spots)
cd.spots.head()
Compute a distance matrix for one chromosome¶
import matplotlib.pyplot as plt
trace_id = cd.spots['trace_id'].iloc[0]
D = cd.compute_distances(trace_id=trace_id)
fig, ax = plt.subplots(figsize=(5, 5))
im = ax.imshow(D, cmap='viridis_r', aspect='equal')
ax.set_title(f'Distance matrix — trace {trace_id}')
fig.colorbar(im, ax=ax, label='distance')
Using other reconstruction backends¶
python -m uchrom.recon.sc.gem <in> <out.h5cd> — energy-minimisation
variant (smoother, slower).
python -m uchrom.recon.bulk.mds <contact.mcool> <out.h5cd> --resolution=100000
— PyTorch SMACOF MDS on bulk Hi-C data. Inter-chromosomal whole-genome
reconstruction via --inter with the additional --resolution_inter flag.
All three write the same .h5cd schema so downstream tools don’t care
which reconstructor was used.