uchrom.im¶
Image-pipeline utilities for DNA-FISH data.
Spot-to-fiber tracing¶
Spot → fiber tracing algorithms for multiplexed DNA-FISH data.
Takes detected-and-decoded spots with ambiguous trace assignments
(multiple candidate spots per genomic locus per cell) and decides which
spots belong to the same chromatin fiber. Output is a
uchrom.core.ChromData with a populated trace_id column.
The first backend, uchrom.im.trace.jie, is an independent
reimplementation of Jia & Ren 2022 (Nature Biotechnology,
doi:10.1038/s41587-022-01568-9) — a spatial genome aligner that uses
a Gaussian-chain polymer model to score candidate spot-to-fiber
assignments, then picks fibers via iterative shortest-path.
- class uchrom.im.trace.SpotAlignerParams(persistence_length_bp: float = 150.0, tau_nm_per_bp: float | None = None, scaling_exponent_fixed: bool = True, default_sigma_nm: float = 50.0, max_skip: int = 3, gap_penalty: float = 3.0, max_fibers_per_chrom: int = 8, min_spots_per_fiber: int = 5, mean_edge_weight_cutoff: float = inf, detect_sisters: bool = True, sister_pair_radius_nm: float = 300.0, verbose: bool = False)[source]¶
Bases:
objectRuntime parameters for
align_spots().- default_sigma_nm: float = 50.0¶
Fallback localisation σ when
sigma_*columns absent fromspots_df(axis-wise, nm).
- gap_penalty: float = 3.0¶
Additive per-skip penalty in the Mahalanobis-cost edge weight. A skip-0 edge (adjacent locus) adds nothing; a skip-2 edge adds
2 * gap_penalty. Calibrate against typicalR² / 2S²(around 1.5 for on-chain matches).
- scaling_exponent_fixed: bool = True¶
If True, force α=0.5 (ideal Gaussian chain) in the τ fit. Set False to fit α jointly — useful for crumpled-globule chromatin.
- uchrom.im.trace.align_spots(spots_df: pd.DataFrame, params: SpotAlignerParams | None = None, tau_by_chrom: Dict[str, float] | None = None) ChromData[source]¶
Assign detected FISH spots to chromatin fibers.
See module docstring for the input-DataFrame contract.
jie — spatial genome aligner¶
Independent reimplementation of Jia & Ren 2022 (Nature Biotechnology 22, doi:10.1038/s41587-022-01568-9).
- class uchrom.im.trace.SpotAlignerParams(persistence_length_bp: float = 150.0, tau_nm_per_bp: float | None = None, scaling_exponent_fixed: bool = True, default_sigma_nm: float = 50.0, max_skip: int = 3, gap_penalty: float = 3.0, max_fibers_per_chrom: int = 8, min_spots_per_fiber: int = 5, mean_edge_weight_cutoff: float = inf, detect_sisters: bool = True, sister_pair_radius_nm: float = 300.0, verbose: bool = False)[source]¶
Bases:
objectRuntime parameters for
align_spots().- default_sigma_nm: float = 50.0¶
Fallback localisation σ when
sigma_*columns absent fromspots_df(axis-wise, nm).
- gap_penalty: float = 3.0¶
Additive per-skip penalty in the Mahalanobis-cost edge weight. A skip-0 edge (adjacent locus) adds nothing; a skip-2 edge adds
2 * gap_penalty. Calibrate against typicalR² / 2S²(around 1.5 for on-chain matches).
- scaling_exponent_fixed: bool = True¶
If True, force α=0.5 (ideal Gaussian chain) in the τ fit. Set False to fit α jointly — useful for crumpled-globule chromatin.
Gaussian-chain polymer primitives¶
Gaussian-chain polymer primitives for the spatial genome aligner.
Two-end distance distribution of an ideal Gaussian chain of contour
length L with Kuhn segment b = 2 * l_p (persistence length) is
isotropic Gaussian with variance Nb² / 3 per axis — i.e.
<R²> = 2 * l_p * L for the 3-D distance, and <R²> / 3 per axis.
Adding independent Gaussian localisation noise with axis variance
σ² at each end yields
S²(i, j) = σ_i² + σ_j² + (2/3) · l_p · τ · L_ij
which is the expected axis-wise variance of the observed distance
between two FISH spots connected by a contour length L_ij bp under
a scale factor τ (nm/bp). S² scales the Gaussian-chain bond
probability used by the aligner.
All lengths here are in nm and all genomic distances in bp.
- uchrom.im.trace._polymer.PERSISTENCE_LENGTH_BP_DEFAULT: float = 150.0¶
Persistence length of bare B-form dsDNA, in bp.
One persistence length ≈ 50 nm at the canonical 0.34 nm/bp rise, which gives ~150 bp. For chromatin the effective persistence length is much larger, but is folded into the fitted scale factor
τso we keep the bp value constant.
- uchrom.im.trace._polymer.bond_log_probability(R_nm: ndarray | float, S2_nm2: ndarray | float) ndarray[source]¶
Log probability density of an observed distance
Runder an isotropic 3-D Gaussian with axis varianceS².log p = -1.5 · log(2π·S²) - R² / (2·S²)
- uchrom.im.trace._polymer.expected_distance(L_bp: ndarray | float, l_p_bp: float = 150.0, tau_nm_per_bp: float = 0.01) ndarray[source]¶
Gaussian-chain mean 3-D end-to-end distance
<R> ≈ √(2·l_p·τ·L).Strictly
<R> = √(8/(3π)) · √(<R²>)for a 3-D Gaussian variable, but the scaling-law form√(<R²>)is what the literature calls the “Gaussian chain” expected distance and what jie uses.
- uchrom.im.trace._polymer.expected_distance_sq(L_bp: ndarray | float, l_p_bp: float = 150.0, tau_nm_per_bp: float = 0.01, sigma_nm: ndarray | float = 0.0) ndarray[source]¶
Gaussian-chain axis variance of the end-to-end distance.
S²(L) = 2·σ² + (2/3)·l_p·τ·L- Parameters:
L_bp (ndarray or float) – Genomic distance(s) in bp between the two loci.
l_p_bp (float) – Persistence length in bp.
tau_nm_per_bp (float) – Genomic-to-spatial scale factor (nm / bp).
sigma_nm (ndarray or float) – Axis-wise localisation uncertainty of each endpoint, in nm. A scalar is broadcast to both endpoints; an array of shape (2,) or matching
L_bpis also accepted.
- Returns:
S2 – Axis-wise variance of the observed distance, in nm².
- Return type:
ndarray or float
- uchrom.im.trace._polymer.fit_scale_factor(observed_dist_nm: ndarray, genomic_dist_bp: ndarray, l_p_bp: float = 150.0, fix_exponent: bool = True) Tuple[float, float][source]¶
Fit Gaussian-chain scale factor
τto observed trace distances.Performs a log-log regression of
RonL:log R = 0.5 · log(2·l_p·τ) + α · log L
with
α = 0.5fixed for an ideal Gaussian chain (fix_exponent= True) or fitted jointly. Returns(τ, α).- Parameters:
observed_dist_nm (ndarray) – Observed 3-D distances (nm). NaN and non-positive values are dropped.
genomic_dist_bp (ndarray) – Genomic separations (bp) with the same shape as
observed_dist_nm.l_p_bp (float) – Persistence length (bp).
fix_exponent (bool) – If True, force
α = 0.5(ideal Gaussian chain). If False, fit it jointly — useful for chromatin whereαoften ≈ 0.33 (crumpled globule).
- Returns:
tau_nm_per_bp (float)
scaling_exponent (float) –
0.5whenfix_exponent=True; otherwise the fitted value.
- uchrom.im.trace._polymer.fit_scale_factor_from_matrix(dist_matrix_nm: ndarray, bin_genomic_bp: ndarray, l_p_bp: float = 150.0, aggregator: str = 'median', fix_exponent: bool = True) Tuple[float, float][source]¶
Fit
τfrom a (n_bins, n_bins) distance matrix.For each unique off-diagonal genomic separation
L, aggregates the observed distances (median or mean) across all bin pairs with that separation, then runsfit_scale_factor()on the result.
- uchrom.im.trace._polymer.mahalanobis_cost(R_nm: ndarray | float, S2_nm2: ndarray | float) ndarray[source]¶
Mahalanobis-style squared cost
R² / (2·S²).This is the data-dependent part of the negative log Gaussian density (the exponent, without the
-1.5·log(2π·S²)normalisation constant). For shortest-path aligners this has a useful property thatbond_log_probability()does not: the cost of a single edge is independent of how many loci lie between its endpoints, so paths with different numbers of skipped loci can be compared fairly.Together with a per-skip
gap_penaltyit gives well-calibrated edge weights:w_ij = R²_ij / (2·S²_ij) + (c - 1) · gap_penalty