uchrom.fea¶
Geometric / statistical features over chromatin traces.
- uchrom.fea.axis_variance_cube(cd, chrom: str, device: str = 'auto') dict[source]¶
Compute per-axis pairwise variance + sample-count cubes.
Returns a dict with
var,count,mean(all(3, B, B)) plusbin_ids,n_traces,chrom, and — for downstreamfilter_normalize()— the full(3, T, B, B)pairwise diff tensor on the GPU device under key"diff".
- uchrom.fea.axis_weight(cd, chrom: str | None = None, device: str = 'auto') ndarray[source]¶
Compute per-axis weights
w ∝ 1 / median(trace_variance).For each axis we centre every trace at its own mean (bins with NaN excluded) and take the median across traces of each trace’s variance. The inverse of that median is the axis weight, normalised to sum 1. Consistent with ArcFISH’s
axis_weightroutine.
- uchrom.fea.contact_frequency(df: DataFrame, threshold: float, chrom=None)[source]¶
Fraction of traces where a pair of bins are within
threshold.NaN distances (missing spots) are excluded from both numerator and denominator — each bin pair’s frequency is over the set of traces that have both endpoints detected.
- Parameters:
df (DataFrame with spots + coords.)
threshold (distance threshold in the same units as x/y/z.)
chrom (optional chromosome filter.)
- Returns:
frequency (ndarray (n_bins, n_bins) in [0, 1], NaN where no) – trace had both endpoints detected.
bin_ids (list of (start, end))
n_traces (int)
- uchrom.fea.filter_normalize(cube: dict, k_sigma: float = 4.0, frac: float = 0.1) dict[source]¶
ArcFISH-style per-trace LOWESS filter + normalise.
Operates on the full
(3, n_traces, n_bins, n_bins)pairwise-diff tensor kept on the GPU (undercube['diff']). Two passes:Per-pair
raw_var = nanmedian(trace_diff²)→ LOWESS overlog(genomic_distance)→strata_std. Individual trace observations where|diff - median(diff)| > k_sigma × strata_stdare NaN’d in-place in the 4D tensor.After filtering, per-pair
filtered_var = nanmean((diff - mean)²)and per-paircount = n_validrecomputed. LOWESS again over log(d1d) → expected; normalised variance = filtered / expected.
Output (numpy, on CPU):
var,count(refreshed after filter),norm_var,expected,raw_var,genomic_distance. The original 4D tensor under"diff"is consumed (may be modified).
- uchrom.fea.mean_distance_matrix(df: DataFrame, chrom=None, reduce: str = 'median')[source]¶
Population-level mean/median pairwise distance matrix.
For each pair of genomic bins
(i, j), the distance is computed per-trace and then reduced across traces withnp.nanmedian(the Bintu 2018 convention) ornp.nanmean.- Parameters:
df (DataFrame with spots + coords.)
chrom (optional chromosome filter.)
reduce ('median' (default) or 'mean'.)
- Returns:
matrix (ndarray (n_bins, n_bins))
bin_ids (list of (start, end))
n_traces (int)
- uchrom.fea.radius_of_gyration(df: DataFrame, chrom=None) Series[source]¶
Per-trace radius of gyration.
Rg = sqrt(mean over spots of ||r - centroid||²). Traces with fewer than 2 spots contribute NaN.
Distance-based aggregates¶
Distance-based aggregate statistics over a population of traces.
Input convention: a flat DataFrame with columns
chrom, start, end, x, y, z, trace_id (what ChromData.to_dataframe()
produces, or what the browser’s ChromatinLayer.df stores).
The core helper _bin_coord_cube() pivots the flat table into a
(n_traces, n_bins, 3) array with NaN for missing spots, which lets
every aggregate statistic be computed as a straightforward NaN-aware
reduction.
- uchrom.fea.distance.contact_frequency(df: DataFrame, threshold: float, chrom=None)[source]¶
Fraction of traces where a pair of bins are within
threshold.NaN distances (missing spots) are excluded from both numerator and denominator — each bin pair’s frequency is over the set of traces that have both endpoints detected.
- Parameters:
df (DataFrame with spots + coords.)
threshold (distance threshold in the same units as x/y/z.)
chrom (optional chromosome filter.)
- Returns:
frequency (ndarray (n_bins, n_bins) in [0, 1], NaN where no) – trace had both endpoints detected.
bin_ids (list of (start, end))
n_traces (int)
- uchrom.fea.distance.mean_distance_matrix(df: DataFrame, chrom=None, reduce: str = 'median')[source]¶
Population-level mean/median pairwise distance matrix.
For each pair of genomic bins
(i, j), the distance is computed per-trace and then reduced across traces withnp.nanmedian(the Bintu 2018 convention) ornp.nanmean.- Parameters:
df (DataFrame with spots + coords.)
chrom (optional chromosome filter.)
reduce ('median' (default) or 'mean'.)
- Returns:
matrix (ndarray (n_bins, n_bins))
bin_ids (list of (start, end))
n_traces (int)
Axis-wise preprocessing¶
ArcFISH-style axis-wise preprocessing for chromatin tracing data.
References
Yu H. et al. Accurate and robust 3D genome feature discovery from multiplexed DNA FISH, bioRxiv 2025.11.26.690837v1.
Independent implementation in uchrom — not derived from the GPL-3.0
ArcFISH source.
Pipeline (per chromosome)¶
axis_variance_cubeBuilds(3, n_bins, n_bins)per-axis pairwise variance + count cubes fromChromDataspots. Each trace contributes a rank-1 outer difference for each axis; aggregation is NaN-aware.filter_normalizeTwo-pass LOWESS stratification on log(1D genomic distance):first pass: flag entries whose per-pair squared deviation is more than
k_sigma× stratified std as outliers and NaN them;second pass: refit LOWESS on the cleaned variances to give each entry a genome-distance-matched expectation, then normalise.
axis_weightReturns a 3-vector of weights (sum 1) inversely proportional to the per-axis trace-variance median — the exact weighting used by the ACAT combination step in the loop / tad / comp callers.
All tensor-heavy computation runs on a user-selected torch device
('auto' | 'cpu' | 'cuda' | 'mps'). LOWESS stays on CPU via
statsmodels because it’s a non-vectorised kernel smoother whose
input size is O(n_bins²) (typically ≤ 10 k).
- uchrom.fea.arc.axis_variance_cube(cd, chrom: str, device: str = 'auto') dict[source]¶
Compute per-axis pairwise variance + sample-count cubes.
Returns a dict with
var,count,mean(all(3, B, B)) plusbin_ids,n_traces,chrom, and — for downstreamfilter_normalize()— the full(3, T, B, B)pairwise diff tensor on the GPU device under key"diff".
- uchrom.fea.arc.axis_weight(cd, chrom: str | None = None, device: str = 'auto') ndarray[source]¶
Compute per-axis weights
w ∝ 1 / median(trace_variance).For each axis we centre every trace at its own mean (bins with NaN excluded) and take the median across traces of each trace’s variance. The inverse of that median is the axis weight, normalised to sum 1. Consistent with ArcFISH’s
axis_weightroutine.
- uchrom.fea.arc.filter_normalize(cube: dict, k_sigma: float = 4.0, frac: float = 0.1) dict[source]¶
ArcFISH-style per-trace LOWESS filter + normalise.
Operates on the full
(3, n_traces, n_bins, n_bins)pairwise-diff tensor kept on the GPU (undercube['diff']). Two passes:Per-pair
raw_var = nanmedian(trace_diff²)→ LOWESS overlog(genomic_distance)→strata_std. Individual trace observations where|diff - median(diff)| > k_sigma × strata_stdare NaN’d in-place in the 4D tensor.After filtering, per-pair
filtered_var = nanmean((diff - mean)²)and per-paircount = n_validrecomputed. LOWESS again over log(d1d) → expected; normalised variance = filtered / expected.
Output (numpy, on CPU):
var,count(refreshed after filter),norm_var,expected,raw_var,genomic_distance. The original 4D tensor under"diff"is consumed (may be modified).