Importing PyHiM chromatin trace tables

PyHiM (Devos et al. 2024, Genome Biology) is a Python toolkit for multiplexed DNA-FISH image analysis. It outputs chromatin trace tables as Astropy ECSV files with columns:

Spot_ID, Trace_ID, x, y, z, Chrom, Chrom_Start, Chrom_End,
ROI #, Mask_id, Barcode #, label

U-Chrom’s ChromData.from_pyhim_trace() reads these ECSV files directly into the standard ChromData container, so you can use the browser, structure callers, and embedding tools on PyHiM outputs without manual conversion.

This notebook demonstrates the reader on Bintu et al. 2018 IMR90 chr21 chromatin tracing data (1,277 traces × 66 loci, 30 kb spacing), converted to PyHiM ECSV format.

1. Locate the demo ECSV

We use a PyHiM-format ECSV derived from Bintu et al. 2018 (Science 362:eaau1783) — IMR90 cells, chr21:18.6–20.6 Mb, 66 segments × 30 kb. The conversion script is in example-data/bintu_to_pyhim_ecsv.py.

from pathlib import Path
import numpy as np
import pandas as pd
from uchrom import ChromData

def _repo_root():
    for p in [Path.cwd(), *Path.cwd().parents]:
        if (p / "pyproject.toml").exists():
            return p
    return Path.cwd()

root = _repo_root()
ecsv_path = root / "example-data" / "IMR90_chr21_pyhim.ecsv"

if not ecsv_path.exists():
    # Generate it from the Bintu CSV
    import subprocess
    script = root / "example-data" / "bintu_to_pyhim_ecsv.py"
    subprocess.run(["python", str(script)], check=True)

print(f"ECSV: {ecsv_path}")
print(f"Size: {ecsv_path.stat().st_size / 1e6:.1f} MB")

2. Read the ECSV into ChromData

ChromData.from_pyhim_trace() parses the ECSV and maps PyHiM columns to ChromData’s schema:

  • x, y, zcd.coords

  • Chrom, Chrom_Start, Chrom_End, Trace_IDcd.spots

  • Mask_idcell_id (PyHiM convention: each trace = one “cell”)

  • ECSV meta['comments']cd.uns['xyz_unit'], cd.uns['genome_assembly']

cd = ChromData.from_pyhim_trace(ecsv_path)
cd
print(f"Traces: {cd.spots['trace_id'].nunique()}")
print(f"Spots per trace: {len(cd.spots) // cd.spots['trace_id'].nunique()}")
print(f"Genomic region: {cd.spots['chrom'].iloc[0]}:{cd.spots['start'].min():,}{cd.spots['end'].max():,}")
print(f"xyz_unit: {cd.uns.get('xyz_unit')}")
print(f"genome_assembly: {cd.uns.get('genome_assembly')}")

3. Inspect a single trace

Each trace is a chromatin fiber — 66 consecutive loci on chr21.

trace0 = cd.get_trace("1")  # first trace
print(f"Trace '1': {len(trace0.spots)} spots")
print(trace0.spots[["chrom", "start", "end"]].head(10))

4. Compute radius of gyration (Rg)

Rg measures the spatial compactness of each trace.

def compute_rg(coords):
    centroid = coords.mean(axis=0)
    return np.sqrt(((coords - centroid) ** 2).sum(axis=1).mean())

rg_per_trace = []
for tid in cd.spots["trace_id"].unique():
    tr = cd.get_trace(tid)
    rg_per_trace.append(compute_rg(tr.coords))

rg_per_trace = np.array(rg_per_trace)
print(f"Rg (microns): mean={rg_per_trace.mean():.2f}, std={rg_per_trace.std():.2f}")
print(f"              min={rg_per_trace.min():.2f}, max={rg_per_trace.max():.2f}")

5. Visualize Rg distribution

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(6, 4))
ax.hist(rg_per_trace, bins=40, edgecolor="black", alpha=0.7)
ax.set_xlabel("Radius of gyration (μm)")
ax.set_ylabel("Number of traces")
ax.set_title(f"Bintu 2018 IMR90 chr21 (n={len(rg_per_trace)} traces)")
plt.tight_layout()

6. Plot a few example traces in 3D

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection="3d")

# Plot first 5 traces
for i, tid in enumerate(list(cd.spots["trace_id"].unique())[:5]):
    tr = cd.get_trace(tid)
    ax.plot(tr.coords[:, 0], tr.coords[:, 1], tr.coords[:, 2],
            marker="o", markersize=3, alpha=0.7, label=f"Trace {tid}")

ax.set_xlabel("x (μm)")
ax.set_ylabel("y (μm)")
ax.set_zlabel("z (μm)")
ax.set_title("5 example chromatin traces")
ax.legend()
plt.tight_layout()

7. Compute pairwise distance matrix for one trace

The distance matrix shows spatial proximity between loci.

trace0 = cd.get_trace("1")
dist_mat = trace0.compute_distances()

fig, ax = plt.subplots(figsize=(6, 5))
im = ax.imshow(dist_mat, cmap="viridis", origin="lower")
ax.set_xlabel("Locus index")
ax.set_ylabel("Locus index")
ax.set_title("Pairwise distance matrix (trace '1')")
plt.colorbar(im, ax=ax, label="Distance (μm)")
plt.tight_layout()

Next steps

  • Save as .h5cd: cd.write("imr90_chr21.h5cd") for fast reload

  • 3D browser: python -m uchrom.browser imr90_chr21.h5cd (interactive PyVista viewer)

  • Structure calling: TAD/loop/compartment callers work on imaging data too

  • Embedding: If you have multiple cells, run FastHigashi on the population

For PyHiM outputs with empty Chrom columns, pass a barcode_dict:

cd = ChromData.from_pyhim_trace(
    "Trace_3D.ecsv",
    barcode_dict={1: ("chr1", 10_000_000, 10_050_000), ...}
)

See ChromData.from_pyhim_trace docstring for details.