uchrom.im

Image-pipeline utilities for DNA-FISH data.

Spot-to-fiber tracing

Spot → fiber tracing algorithms for multiplexed DNA-FISH data.

Takes detected-and-decoded spots with ambiguous trace assignments (multiple candidate spots per genomic locus per cell) and decides which spots belong to the same chromatin fiber. Output is a uchrom.core.ChromData with a populated trace_id column.

The first backend, uchrom.im.trace.jie, is an independent reimplementation of Jia & Ren 2022 (Nature Biotechnology, doi:10.1038/s41587-022-01568-9) — a spatial genome aligner that uses a Gaussian-chain polymer model to score candidate spot-to-fiber assignments, then picks fibers via iterative shortest-path.

class uchrom.im.trace.SpotAlignerParams(persistence_length_bp: float = 150.0, tau_nm_per_bp: float | None = None, scaling_exponent_fixed: bool = True, default_sigma_nm: float = 50.0, max_skip: int = 3, gap_penalty: float = 3.0, max_fibers_per_chrom: int = 8, min_spots_per_fiber: int = 5, mean_edge_weight_cutoff: float = inf, detect_sisters: bool = True, sister_pair_radius_nm: float = 300.0, verbose: bool = False)[source]

Bases: object

Runtime parameters for align_spots().

default_sigma_nm: float = 50.0

Fallback localisation σ when sigma_* columns absent from spots_df (axis-wise, nm).

detect_sisters: bool = True
gap_penalty: float = 3.0

Additive per-skip penalty in the Mahalanobis-cost edge weight. A skip-0 edge (adjacent locus) adds nothing; a skip-2 edge adds 2 * gap_penalty. Calibrate against typical / 2S² (around 1.5 for on-chain matches).

max_fibers_per_chrom: int = 8
max_skip: int = 3
mean_edge_weight_cutoff: float = inf
min_spots_per_fiber: int = 5
persistence_length_bp: float = 150.0
scaling_exponent_fixed: bool = True

If True, force α=0.5 (ideal Gaussian chain) in the τ fit. Set False to fit α jointly — useful for crumpled-globule chromatin.

sister_pair_radius_nm: float = 300.0
tau_nm_per_bp: float | None = None

Genomic-to-spatial scale. If None, fit per-chromosome from the aggregate observed-distance-vs-genomic-distance curve.

verbose: bool = False
uchrom.im.trace.align_spots(spots_df: pd.DataFrame, params: SpotAlignerParams | None = None, tau_by_chrom: Dict[str, float] | None = None) ChromData[source]

Assign detected FISH spots to chromatin fibers.

See module docstring for the input-DataFrame contract.

jie — spatial genome aligner

Independent reimplementation of Jia & Ren 2022 (Nature Biotechnology 22, doi:10.1038/s41587-022-01568-9).

class uchrom.im.trace.SpotAlignerParams(persistence_length_bp: float = 150.0, tau_nm_per_bp: float | None = None, scaling_exponent_fixed: bool = True, default_sigma_nm: float = 50.0, max_skip: int = 3, gap_penalty: float = 3.0, max_fibers_per_chrom: int = 8, min_spots_per_fiber: int = 5, mean_edge_weight_cutoff: float = inf, detect_sisters: bool = True, sister_pair_radius_nm: float = 300.0, verbose: bool = False)[source]

Bases: object

Runtime parameters for align_spots().

default_sigma_nm: float = 50.0

Fallback localisation σ when sigma_* columns absent from spots_df (axis-wise, nm).

detect_sisters: bool = True
gap_penalty: float = 3.0

Additive per-skip penalty in the Mahalanobis-cost edge weight. A skip-0 edge (adjacent locus) adds nothing; a skip-2 edge adds 2 * gap_penalty. Calibrate against typical / 2S² (around 1.5 for on-chain matches).

max_fibers_per_chrom: int = 8
max_skip: int = 3
mean_edge_weight_cutoff: float = inf
min_spots_per_fiber: int = 5
persistence_length_bp: float = 150.0
scaling_exponent_fixed: bool = True

If True, force α=0.5 (ideal Gaussian chain) in the τ fit. Set False to fit α jointly — useful for crumpled-globule chromatin.

sister_pair_radius_nm: float = 300.0
tau_nm_per_bp: float | None = None

Genomic-to-spatial scale. If None, fit per-chromosome from the aggregate observed-distance-vs-genomic-distance curve.

verbose: bool = False
uchrom.im.trace.align_spots(spots_df: pd.DataFrame, params: SpotAlignerParams | None = None, tau_by_chrom: Dict[str, float] | None = None) ChromData[source]

Assign detected FISH spots to chromatin fibers.

See module docstring for the input-DataFrame contract.

Gaussian-chain polymer primitives

Gaussian-chain polymer primitives for the spatial genome aligner.

Two-end distance distribution of an ideal Gaussian chain of contour length L with Kuhn segment b = 2 * l_p (persistence length) is isotropic Gaussian with variance Nb² / 3 per axis — i.e. <R²> = 2 * l_p * L for the 3-D distance, and <R²> / 3 per axis.

Adding independent Gaussian localisation noise with axis variance σ² at each end yields

S²(i, j) = σ_i² + σ_j² + (2/3) · l_p · τ · L_ij

which is the expected axis-wise variance of the observed distance between two FISH spots connected by a contour length L_ij bp under a scale factor τ (nm/bp). scales the Gaussian-chain bond probability used by the aligner.

All lengths here are in nm and all genomic distances in bp.

uchrom.im.trace._polymer.PERSISTENCE_LENGTH_BP_DEFAULT: float = 150.0

Persistence length of bare B-form dsDNA, in bp.

One persistence length ≈ 50 nm at the canonical 0.34 nm/bp rise, which gives ~150 bp. For chromatin the effective persistence length is much larger, but is folded into the fitted scale factor τ so we keep the bp value constant.

uchrom.im.trace._polymer.bond_log_probability(R_nm: ndarray | float, S2_nm2: ndarray | float) ndarray[source]

Log probability density of an observed distance R under an isotropic 3-D Gaussian with axis variance .

log p = -1.5 · log(2π·S²) - / (2·S²)

uchrom.im.trace._polymer.expected_distance(L_bp: ndarray | float, l_p_bp: float = 150.0, tau_nm_per_bp: float = 0.01) ndarray[source]

Gaussian-chain mean 3-D end-to-end distance <R> √(2·l_p·τ·L).

Strictly <R> = √(8/(3π)) · √(<R²>) for a 3-D Gaussian variable, but the scaling-law form √(<R²>) is what the literature calls the “Gaussian chain” expected distance and what jie uses.

uchrom.im.trace._polymer.expected_distance_sq(L_bp: ndarray | float, l_p_bp: float = 150.0, tau_nm_per_bp: float = 0.01, sigma_nm: ndarray | float = 0.0) ndarray[source]

Gaussian-chain axis variance of the end-to-end distance.

S²(L) = 2·σ² + (2/3)·l_p·τ·L

Parameters:
  • L_bp (ndarray or float) – Genomic distance(s) in bp between the two loci.

  • l_p_bp (float) – Persistence length in bp.

  • tau_nm_per_bp (float) – Genomic-to-spatial scale factor (nm / bp).

  • sigma_nm (ndarray or float) – Axis-wise localisation uncertainty of each endpoint, in nm. A scalar is broadcast to both endpoints; an array of shape (2,) or matching L_bp is also accepted.

Returns:

S2 – Axis-wise variance of the observed distance, in nm².

Return type:

ndarray or float

uchrom.im.trace._polymer.fit_scale_factor(observed_dist_nm: ndarray, genomic_dist_bp: ndarray, l_p_bp: float = 150.0, fix_exponent: bool = True) Tuple[float, float][source]

Fit Gaussian-chain scale factor τ to observed trace distances.

Performs a log-log regression of R on L:

log R = 0.5 · log(2·l_p·τ) + α · log L

with α = 0.5 fixed for an ideal Gaussian chain (fix_exponent= True) or fitted jointly. Returns (τ, α).

Parameters:
  • observed_dist_nm (ndarray) – Observed 3-D distances (nm). NaN and non-positive values are dropped.

  • genomic_dist_bp (ndarray) – Genomic separations (bp) with the same shape as observed_dist_nm.

  • l_p_bp (float) – Persistence length (bp).

  • fix_exponent (bool) – If True, force α = 0.5 (ideal Gaussian chain). If False, fit it jointly — useful for chromatin where α often ≈ 0.33 (crumpled globule).

Returns:

  • tau_nm_per_bp (float)

  • scaling_exponent (float) – 0.5 when fix_exponent=True; otherwise the fitted value.

uchrom.im.trace._polymer.fit_scale_factor_from_matrix(dist_matrix_nm: ndarray, bin_genomic_bp: ndarray, l_p_bp: float = 150.0, aggregator: str = 'median', fix_exponent: bool = True) Tuple[float, float][source]

Fit τ from a (n_bins, n_bins) distance matrix.

For each unique off-diagonal genomic separation L, aggregates the observed distances (median or mean) across all bin pairs with that separation, then runs fit_scale_factor() on the result.

Parameters:
  • dist_matrix_nm (ndarray (n_bins, n_bins)) – Pairwise distances in nm. NaNs are ignored.

  • bin_genomic_bp (ndarray (n_bins,)) – Genomic midpoint of each bin, in bp.

  • l_p_bp (float)

  • aggregator ('median' | 'mean')

  • fix_exponent (bool)

Returns:

  • tau_nm_per_bp (float)

  • scaling_exponent (float)

uchrom.im.trace._polymer.mahalanobis_cost(R_nm: ndarray | float, S2_nm2: ndarray | float) ndarray[source]

Mahalanobis-style squared cost / (2·S²).

This is the data-dependent part of the negative log Gaussian density (the exponent, without the -1.5·log(2π·S²) normalisation constant). For shortest-path aligners this has a useful property that bond_log_probability() does not: the cost of a single edge is independent of how many loci lie between its endpoints, so paths with different numbers of skipped loci can be compared fairly.

Together with a per-skip gap_penalty it gives well-calibrated edge weights:

w_ij = R²_ij / (2·S²_ij) + (c - 1) · gap_penalty