uchrom.core¶
ChromData¶
- class uchrom.core.ChromData(coords: ndarray, spots: DataFrame, *, cells: DataFrame | None = None, cellm: Dict[str, ndarray] | None = None, tracks: DataFrame | None = None, traces: DataFrame | None = None, layers: Dict[str, ndarray] | None = None, results: dict | None = None, uns: dict | None = None, linked_adata=None, validate: bool = True)[source]¶
Bases:
objectChromatin Data — the core container for U-Chrom.
See the module docstring of
uchrom.core.cdatafor the full purpose, hierarchy, FOF-CT mapping, and on-disk format contract. Summary:The central abstraction is a “structure table” — genomic bins mapped to 3D coordinates. Each row of
spots(with the corresponding row ofcoords) is one Spot.Spots are grouped hierarchically as Cell → Trace → Spot. A Trace is an ordered chromatin-fibre polymer; a Cell contains one or more traces.
All analysis in U-Chrom consumes or produces
ChromData. Reconstruction modules (uchrom.recon) emit it, structure callers (uchrom.strc) decoratecd.results[...]with TADs / loops / compartments, and the browser / plotters render it.
- Parameters:
coords (ndarray, shape (n_spots, 3)) – 3-D coordinates (x, y, z) per spot.
spots (DataFrame, shape (n_spots, ≥4)) – Per-spot metadata. Required columns:
chrom(str, will be categorified),start(int, 0-based BED-style),end(int, non-inclusive),trace_id(int or str, will be categorified). Optional:cell_id(int or str),spot_id, FOF-CTsub_cell_roi_id/extra_cell_roi_id, and any experiment-specific annotation column (carried through verbatim).cells (DataFrame, optional) – Per-cell metadata indexed by cell_id.
cellm (dict[str, ndarray], optional) – Per-cell multi-dimensional annotations (embeddings, UMAP, …). Each array’s first axis length =
n_cells.tracks (DataFrame, optional) – Epigenomic signals (ATAC, ChIP-seq, …) row-aligned to
spots. Length must equaln_spots.traces (DataFrame, optional) – Per-trace metadata indexed by trace_id.
layers (dict[str, ndarray], optional) – Alternative coordinate sets, each with shape
(n_spots, 3)— e.g. raw / drift-corrected / aligned.results (dict, optional) – Analysis outputs. Conventional keys:
'loops'→ DataFrame (chrom1, start1, end1, …),'tads'→ DataFrame (chrom, start, end, …),'compartments'→ ndarray or DataFrame per bin.uns (dict, optional) – Unstructured metadata preserved on disk. Conventional keys:
'genome_assembly','xyz_unit','fofct_header'.validate (bool) – If True (default), validate internal consistency on construction (coords shape, spots required columns, tracks / layers alignment).
- n_spots, n_traces, n_cells, chroms
- Type:
derived accessors
- Key methods
- -----------
- from_dataframe, from_fofct, read, write, to_dataframe,
- get_cell, get_trace, get_chrom, compute_distances
- On-disk format — ``.h5cd``
- --------------------------
- Versioned HDF5. See :mod:`uchrom.core.cdata` module docstring for
- the full layout and the :meth:`read` / :meth:`write` round-trip
- contract.
Notes
Subsetting (
cd[mask],get_chrometc.) always returns a newChromData; the source is not mutated.Global pairwise distance matrices are intentionally not stored — they are biologically meaningful per-trace, not across cells, and would be O(n²) memory. Compute on demand via
compute_distances(trace_id=...)().String columns (
chrom,trace_id,cell_id) are auto-converted topd.Categoricalfor ~10× memory savings.
- property auto_discovery_schema: dict¶
Alias for
discovery_schema.
- build_discovery_schema(*, store: bool = True, **kwargs) dict[source]¶
Build the auto-discovery schema, optionally storing it in
uns.The stored representation is an HDF5-friendly JSON payload under
uns['auto_discovery_schema'], so it round-trips with.h5cd.
- compute_distances(trace_id=None) ndarray[source]¶
Compute pairwise Euclidean distance matrix.
- Parameters:
trace_id (optional) – If given, compute only for spots in that trace. If None, compute for all spots (use with caution on large data).
- Return type:
np.ndarray, shape (n, n)
- describe_for_agent(*, max_items: int = 40) str[source]¶
Return a compact prompt-ready description of available data.
- property discovery_schema: dict¶
Agent-readable auto-discovery schema for this
ChromData.If a schema is stored in
uns['auto_discovery_schema']it is parsed and returned. Otherwise a fresh in-memory schema is built without mutatinguns.
- classmethod from_dataframe(df: DataFrame, *, cell_id=None, **kwargs) ChromData[source]¶
Create from a reconstruction output DataFrame.
Expects columns: chrom, start, end, x, y, z. Each chromosome becomes one trace.
- Parameters:
df (DataFrame with columns chrom, start, end, x, y, z.)
cell_id (hashable, optional) – If given, tag every spot with this cell identifier (e.g. derived from the output filename for single-cell reconstruction). The DataFrame’s own
cell_idcolumn, if any, takes precedence.**kwargs – Forwarded to the
ChromDataconstructor.
- classmethod from_fofct(core_path: str | Path, **kwargs) ChromData[source]¶
Read from FOF-CT core table file.
- Parameters:
core_path (path) – Path to the FOF-CT core table (CSV/TSV/TXT).
**kwargs – Additional keyword arguments passed to ChromData constructor (e.g. cells, tracks, uns).
- classmethod from_pyhim_trace(ecsv_path: str | Path, barcode_dict: dict | DataFrame | None = None, **kwargs) ChromData[source]¶
Read a PyHiM chromatin-trace ECSV table into a ChromData.
PyHiM (Devos et al. 2024) emits one ECSV file per trace-building run. Schema (from
chromatin_trace_table.pyupstream):Spot_ID, Trace_ID, x, y, z, Chrom, Chrom_Start, Chrom_End, ROI #, Mask_id, Barcode #, label
meta['comments']carriesxyz_unit=...andgenome_assembly=....- Parameters:
ecsv_path (path) – Path to the ECSV file written by PyHiM.
barcode_dict (dict[int, (chrom, start, end)] or DataFrame, optional) – Required when
Chrom/Chrom_Start/Chrom_Endare empty in the ECSV (PyHiM does not always populate them). As a DataFrame, expects columnsbarcode, chrom, start, end. IfChromis populated,barcode_dictis ignored.**kwargs – Additional keyword arguments passed to the ChromData constructor (
cells,tracks,uns, …).
Notes
Mask_idbecomescell_id(PyHiM convention).ECSV header comments are captured in
cd.uns['pyhim']['ecsv_comments']and anyxyz_unit/genome_assemblyentries are also promoted tocd.unsdirectly (matchingfrom_fofct()).
- classmethod from_seqfish_multiomics(spot_glob, **kwargs) ChromData[source]¶
Load Takei 2025 DNA seqFISH+ cerebellum data.
Thin shim around
uchrom.io.seqfish_multiomics.read_seqfish_multiomics(). See that function for the full parameter list.
- classmethod from_seqfish_multiomics_linked(spot_glob, **kwargs)[source]¶
Load linked Takei 2025 DNA tracing + RNA AnnData artifacts.
Thin shim around
uchrom.io.seqfish_multiomics.load_seqfish_multiomics_linked(). Returns aChromDatawith RNA expression available atcd.linked_adataand can write paired.h5cd/.h5adfiles.
- classmethod from_takei2025_cerebellum(**kwargs)[source]¶
Load linked Takei 2025 cerebellum data.
Thin shim around
uchrom.io.seqfish_multiomics.load_takei2025_cerebellum(). Returns aChromDatawith RNA expression available atcd.linked_adata.
- link_anndata(adata, *, cell_id_col: str | None = None, copy_obs: bool = True, copy_obsm: bool = True) int[source]¶
Import cell-level metadata from an AnnData into this ChromData.
Matches cells by
cell_id: each unique value inspots['cell_id']is looked up inadata.obs(by index, or by the column cell_id_col if given). Matched cells get theiradata.obscolumns merged intoself.cellsand theiradata.obsmarrays copied intoself.cellm.If
self.cellsalready exists, its row order is preserved and AnnData rows are aligned onto that cell axis. This is important for multi-omics loaders such as Takei 2025, where chromatin tracing coordinates live incoords/spots, RNA/IF signals live in spot-leveltracks, and mRNA clustering/UMAP already live incells/cellm.- Parameters:
adata (anndata.AnnData) – The single-cell dataset to link (e.g. scRNA-seq).
cell_id_col (str, optional) – Column in
adata.obsthat holds cell identifiers matchingspots['cell_id']. IfNone,adata.obs.indexis used as the key.copy_obs (bool) – If True (default), copy
adata.obscolumns intoself.cells.copy_obsm (bool) – If True (default), copy
adata.obsmarrays intoself.cellm.
- Returns:
Number of cells matched.
- Return type:
- Raises:
KeyError – If
spotshas nocell_idcolumn.
- property linked_adata¶
Linked AnnData object, loaded lazily from
unsmetadata if possible.
- load_linked_anndata(path: str | Path | None = None)[source]¶
Load, cache, and return the linked AnnData object.
- classmethod read(path: str | Path) ChromData[source]¶
Read from HDF5 (.h5cd) file.
Dispatches to a version-specific reader based on
f.attrs['uchrom_format_version']. Files written before versioning was introduced are read with a warning using the v1.0 reader (the on-disk layout has been stable from the start).Forward compatibility:
Same MAJOR, higher MINOR → read with a warning; unknown fields are ignored silently by the lower-level helpers.
Different MAJOR → raise
ValueErrorwith guidance.
- to_anndata()[source]¶
Export cell-level data as an AnnData object.
Creates an AnnData where each observation is a cell,
obsisself.cells, andobsmisself.cellm. TheXmatrix is left empty (zeros) because ChromData has no cell-by-feature expression matrix. Spot-level RNA-FISH / IF / epigenomic signals, such as Takei 2025 tracks, remain inself.tracksand are not flattened into AnnData.X.- Return type:
- Raises:
ImportError – If anndata is not installed.
- update_discovery_schema(schema: dict | None = None, **kwargs) dict[source]¶
Store a supplied or newly built auto-discovery schema in
uns.