ChromData¶
uchrom.ChromData is the central container. It is conceptually similar
to AnnData but organises the data
around the chromatin-tracing hierarchy Cell → Trace → Spot.
Construction¶
import numpy as np
import pandas as pd
from uchrom import ChromData
coords = np.asarray([[1.0, 2.0, 3.0], [1.5, 2.3, 2.9]])
spots = pd.DataFrame({
"chrom": ["chr1", "chr1"],
"start": [0, 100_000],
"end": [100_000, 200_000],
"trace_id": [0, 0],
})
cd = ChromData(coords, spots, uns={"genome_assembly": "GRCh38"})
Required spot columns¶
Column |
Type |
Required |
FOF-CT field |
|---|---|---|---|
|
str (category) |
yes |
|
|
int |
yes |
|
|
int |
yes |
|
|
int/str (category) |
yes |
|
|
int/str (category) |
no |
|
|
int/str |
no |
|
String columns are auto-converted to pd.Categorical for ~10× memory
savings on large datasets.
Attributes¶
Attribute |
Shape |
Purpose |
|---|---|---|
|
|
x, y, z for each spot |
|
|
per-spot metadata |
|
|
per-cell metadata |
|
dict |
multi-dim cell annotations (embeddings, UMAP) |
|
|
epigenomic signals aligned to spots |
|
|
per-trace metadata |
|
dict |
alternative coordinate sets |
|
dict |
analysis outputs (loops, TADs, …) |
|
dict |
unstructured metadata |
Properties: cd.n_spots, cd.n_traces, cd.n_cells, cd.chroms.
Subsetting¶
Subset operations return new ChromData instances — they never
mutate the original.
cd.get_chrom("chr1") # all spots on chr1
cd.get_trace(5) # only trace 5
cd.get_cell("cell_0") # only cell "cell_0"
cd[cd.spots["chrom"] == "chr1"] # boolean mask
cd[:100] # first 100 spots
All children (coords, spots, tracks, layers, cellm) are filtered consistently; traces and cells that become empty are dropped.
On-demand distance matrix¶
Global (n_spots, n_spots) matrices are not stored — they are
meaningless across cells and scale poorly. Compute per-trace on demand:
D = cd.compute_distances(trace_id=0) # (n_spots_in_trace, n_spots_in_trace)
I/O¶
Input |
Method |
|---|---|
Reconstruction CSV (chrom, start, end, x, y, z) |
|
4DN FOF-CT core table |
|
|
|
cd.write("data.h5cd")
cd2 = ChromData.read("data.h5cd")
The .h5cd format has a version stamp and gracefully handles legacy
files. See Concepts — On-disk format for details.
Why not AnnData?¶
AnnData observations are flat (one row = one cell). Chromatin tracing
has an extra hierarchy (Cell → Trace → Spot), and the natural “sample
axis” differs per operation — a distance matrix is per-trace, cell-type
embeddings are per-cell, TAD calls are per-locus. ChromData exposes
that hierarchy directly while reusing the AnnData-like obsm/uns
conventions where they apply (cellm, uns).