import os
os.chdir('/Users/weizexu/Projects/U-Chrom')
print('cwd', os.getcwd())
cwd /Users/weizexu/Projects/U-Chrom
Auto-discovery idea: H3K9me3 radial enrichment should be stronger than shuffled radial assignments¶
Rationale¶
Constitutive heterochromatin is expected to show radial nuclear organization. A negative-control-aware test asks whether H3K9me3 increases with radial score beyond what is produced by shuffled spot positions.
Data used¶
Use H3K9me3 track values, nuclear radial score, cell IDs, and cell type labels.
Analysis sketch¶
Fit a cell-stratified association between tracks.H3K9me3 and tracks.n_rad_score. The single reported parameter is the common radial enrichment slope for H3K9me3.
Expected result¶
The slope should be positive if H3K9me3 is enriched toward the radial compartment represented by higher n_rad_score.
Validation checks¶
Verify fields, minimum cell and spot counts, finite slope, a permutation-test p-value, runtime budget, deterministic rerun, and a negative control that shuffles n_rad_score within each cell.
from pathlib import Path
import json
import os
os.environ.setdefault('MPLBACKEND', 'Agg')
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('Agg', force=True)
import matplotlib.pyplot as plt
from uchrom import ChromData
from uchrom.auto_discovery import DiscoveryIdea, review_idea_against_schema
IDEA = DiscoveryIdea.from_dict({'idea_title': 'H3K9me3 radial enrichment should be stronger than shuffled radial assignments', 'biological_hypothesis': 'H3K9me3 signal is radially organized across chromatin spots, with a positive within-cell association to n_rad_score that is robust to within-cell radial-score permutation.', 'computable_parameter': 'h3k9me3_radial_slope: cell-centered linear slope of tracks.H3K9me3 as a function of tracks.n_rad_score across spots.', 'analysis_plan': 'For each cell, center tracks.H3K9me3 and tracks.n_rad_score to remove cell-level offsets, then estimate one pooled linear slope of centered H3K9me3 on centered n_rad_score. Test whether the slope is greater than zero using a permutation test that shuffles tracks.n_rad_score within each spots.cell_id. Report the observed slope only, with the p-value used for validation.', 'modalities': ['if_tracks', 'cell_metadata'], 'idea_markdown': '### Rationale\nConstitutive heterochromatin is expected to show radial nuclear organization. A negative-control-aware test asks whether H3K9me3 increases with radial score beyond what is produced by shuffled spot positions.\n\n### Data used\nUse H3K9me3 track values, nuclear radial score, cell IDs, and cell type labels.\n\n### Analysis sketch\nFit a cell-stratified association between tracks.H3K9me3 and tracks.n_rad_score. The single reported parameter is the common radial enrichment slope for H3K9me3.\n\n### Expected result\nThe slope should be positive if H3K9me3 is enriched toward the radial compartment represented by higher n_rad_score.\n\n### Validation checks\nVerify fields, minimum cell and spot counts, finite slope, a permutation-test p-value, runtime budget, deterministic rerun, and a negative control that shuffles n_rad_score within each cell.', 'cell_types': ['Granule', 'Bergmann', 'Purkinje'], 'required_fields': ['spots.cell_id', 'tracks.H3K9me3', 'tracks.n_rad_score', 'cells.cell_type'], 'validation_checks': ['required_fields_exist', 'minimum_cell_count', 'minimum_spot_or_trace_count', 'finite_numeric_output', 'statistical_hypothesis_test_with_p_value', 'runtime_under_budget', 'deterministic_rerun', 'negative_control_or_permutation'], 'expected_direction': 'h3k9me3_radial_slope should be positive in real data and centered near zero after within-cell n_rad_score permutation.', 'complexity': 2, 'idea_id': 'h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c', 'metadata': {}})
H5CD_PATH = 'tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd'
RUN_OUTPUT_DIR = Path('tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg')
RUN_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
cdata = ChromData.read(H5CD_PATH) if H5CD_PATH else None
schema = cdata.discovery_schema if cdata is not None else None
adata = cdata.linked_adata if cdata is not None else None
print(IDEA.idea_id)
if cdata is not None:
print(cdata)
print(cdata.describe_for_agent(max_items=20))
h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c
ChromData: n_spots=56036, n_traces=213, n_cells=9
spots: ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
cells: ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (9 cells)
cellm: {'umap': (9, 2)}
tracks: ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
traces: ['dbscan_allele', 'dbscan_ldp_allele'] (213 traces)
uns: ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'auto_discovery_schema', 'leiden_to_cell_type', 'linked_anndata']
linked_adata: (9, 60)
# ChromData discovery schema
dataset: takei2025_doc_subset_pantheon_20
genome: mm10
xyz_unit: um
shape: 56036 spots, 213 traces, 9 cells
modalities:
- cell_metadata: present; operations: cell_type_stratification, embedding_visualization
- chromatin_tracing: present; operations: chromosome_subset, cell_subset, trace_subset, pairwise_3d_distance, intra_chromatin_distance, inter_chromatin_distance
- if_tracks: present; operations: marker_high_low_bin_selection, marker_stratified_distance, per_cell_marker_summary, per_cell_type_marker_summary
- rna_expression: present; operations: gene_expression_lookup, expression_stratification, gene_marker_correlation, chromatin_expression_association
chroms: 20 [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrX]
cell_types: 3 [Bergmann=3, Granule=3, Purkinje=3]
tracks: 62 [CPSF6, ATRX, H4K8ac, HDAC2, H3K9ac, H3K9me3, H3K9me2, RNAPIISer2-P, H3, H3K36me2, UBTF, LaminB1, RNAPIISer5-P, RYBP, HP1beta, RING1B, H2A.X, H3K4me1, H4K20me2, H3K27me2 ...]
linked_adata: shape=[9, 60], X=csr_matrix
genes: 60 [Aldoc, Calb1, Cdh22, Drd3, Eomes, Ephb2, Foxj1, Gabra6, Gpr176, Grm1, Hspb1, Mrc1, Nefh, Npas3, Nptn, Olig1, Pcp2, Pcp4, Plcb3, Plcb4 ...]
known_missing:
- cellm['if_mean'] per-cell IF mean matrix
- raw RNA seqFISH spot geometry as a first-class ChromData component
- scRNA reference matrix for external expression comparison
- gene annotation cache for gene-neighborhood analyses
verification_required:
- required_fields_exist
- minimum_cell_count
- minimum_spot_or_trace_count
- finite_numeric_output
- statistical_hypothesis_test
- runtime_under_budget
- deterministic_rerun
- negative_control_or_permutation
- redundancy_against_existing_parameters
Required data checks¶
review = review_idea_against_schema(IDEA, schema) if schema is not None else None
print(None if review is None else review.to_dict())
assert review is None or review.accepted, review.to_dict()
{'accepted': True, 'errors': [], 'warnings': ['multi-modal idea should include a cell_id_alignment validation check'], 'missing_fields': []}
Exploration¶
The code agent can freely add cells below this point.
Critique and compact analysis plan¶
This audit tests a specific within-cell radial association: H3K9me3 intensity versus n_rad_score. To avoid confounding by cell-to-cell offsets and uneven cell types, I will center both variables within each cell and compute one pooled slope across finite spot-level records. The negative control will preserve the H3K9me3 values and cell memberships while shuffling radial scores within each cell, producing a reproducible permutation null for the slope.
A lightweight inspection cell first checks that spots.cell_id, tracks.H3K9me3, tracks.n_rad_score, and cells.cell_type align and have finite coverage. The main test is one-sided: the observed centered slope should be greater than the within-cell shuffled slopes.
# Lightweight field/coverage inspection for the H3K9me3 radial-slope idea
import numpy as np
import pandas as pd
spot_df = cdata.spots[["cell_id"]].copy()
# Track matrix is spot-aligned; keep only fields needed for audit.
track_df = cdata.tracks[["H3K9me3", "n_rad_score"]].copy()
inspect_df = pd.concat([spot_df.reset_index(drop=True), track_df.reset_index(drop=True)], axis=1)
inspect_df["cell_id"] = inspect_df["cell_id"].astype(str)
cell_meta = cdata.cells[["cell_type"]].copy()
cell_meta.index = cell_meta.index.astype(str)
inspect_df["cell_type"] = inspect_df["cell_id"].map(cell_meta["cell_type"])
finite_mask = np.isfinite(inspect_df["H3K9me3"].to_numpy(dtype=float)) & np.isfinite(inspect_df["n_rad_score"].to_numpy(dtype=float))
coverage = {
"n_spots_total": int(len(inspect_df)),
"n_finite_pairs": int(finite_mask.sum()),
"finite_fraction": float(finite_mask.mean()),
"n_cells_total": int(cdata.n_cells),
"n_cells_with_finite_pairs": int(inspect_df.loc[finite_mask, "cell_id"].nunique()),
"missing_cell_type_rows": int(inspect_df["cell_type"].isna().sum()),
}
counts_by_type = inspect_df.loc[finite_mask].groupby("cell_type", dropna=False).size().rename("finite_spots").reset_index()
preview = inspect_df.head(6)
print("Coverage:")
print(json.dumps(coverage, indent=2))
print("\nFinite spot counts by cell type:")
display(counts_by_type)
print("\nSpot-aligned preview:")
display(preview)
Coverage:
{
"n_spots_total": 56036,
"n_finite_pairs": 56036,
"finite_fraction": 1.0,
"n_cells_total": 9,
"n_cells_with_finite_pairs": 9,
"missing_cell_type_rows": 0
}
Finite spot counts by cell type:
cell_type finite_spots
0 Bergmann 22829
1 Granule 12085
2 Purkinje 21122
Spot-aligned preview:
cell_id H3K9me3 n_rad_score cell_type
0 1_0_61 -1.3514 0.943566 Bergmann
1 1_0_61 -1.3514 0.984917 Bergmann
2 1_0_61 -1.3675 1.000000 Bergmann
3 1_0_61 -1.3996 1.000000 Bergmann
4 1_0_61 -1.4371 0.990121 Bergmann
5 1_0_61 -1.0274 0.983356 Bergmann
# Main compact analysis: cell-centered H3K9me3 radial slope with within-cell permutation test
import os
os.environ.setdefault("MPLBACKEND", "Agg")
import matplotlib
matplotlib.use("Agg", force=True)
import matplotlib.pyplot as plt
from IPython.display import Image, display
import numpy as np
import pandas as pd
import json
from pathlib import Path
rng = np.random.default_rng(20250317)
N_PERM = 500
WORKSPACE = Path('/Users/weizexu/Projects/U-Chrom')
IDEA_ID = "h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c"
result_path = WORKSPACE / "tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg" / f"{IDEA_ID}_result.csv"
figure_path = WORKSPACE / "tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg" / f"{IDEA_ID}_statistical_summary.png"
result_path.parent.mkdir(parents=True, exist_ok=True)
relative_result_path = str(result_path.relative_to(WORKSPACE))
relative_figure_path = str(figure_path.relative_to(WORKSPACE))
# Assemble spot-level records from documented fields only.
df = pd.concat(
[
cdata.spots[["cell_id"]].reset_index(drop=True),
cdata.tracks[["H3K9me3", "n_rad_score"]].reset_index(drop=True),
],
axis=1,
)
df["cell_id"] = df["cell_id"].astype(str)
cell_meta = cdata.cells[["cell_type"]].copy()
cell_meta.index = cell_meta.index.astype(str)
df["cell_type"] = df["cell_id"].map(cell_meta["cell_type"])
df = df.replace([np.inf, -np.inf], np.nan).dropna(subset=["cell_id", "cell_type", "H3K9me3", "n_rad_score"]).copy()
# Center both variables within cells, then pool the centered cross-products.
df["x_centered"] = df["n_rad_score"] - df.groupby("cell_id")["n_rad_score"].transform("mean")
df["y_centered"] = df["H3K9me3"] - df.groupby("cell_id")["H3K9me3"].transform("mean")
valid = np.isfinite(df["x_centered"].to_numpy()) & np.isfinite(df["y_centered"].to_numpy())
df = df.loc[valid].copy()
def pooled_slope(x, y):
denom = float(np.dot(x, x))
if denom <= 0 or not np.isfinite(denom):
return np.nan
return float(np.dot(x, y) / denom)
x = df["x_centered"].to_numpy(dtype=float)
y = df["y_centered"].to_numpy(dtype=float)
observed_slope = pooled_slope(x, y)
cell_groups = [idx.to_numpy() for _, idx in df.groupby("cell_id", sort=True).groups.items()]
cell_slope_rows = []
for cid, g in df.groupby("cell_id", sort=True):
cell_slope_rows.append({
"cell_id": cid,
"cell_type": str(g["cell_type"].iloc[0]),
"n_spots": int(len(g)),
"cell_centered_slope": pooled_slope(g["x_centered"].to_numpy(float), g["y_centered"].to_numpy(float)),
})
cell_slope_table = pd.DataFrame(cell_slope_rows)
hypothesis_test_status = "pass"
notes = []
if len(df) < 3 or df["cell_id"].nunique() < 1 or not np.isfinite(observed_slope):
hypothesis_test_status = "insufficient_data"
null_slopes = np.array([0.0], dtype=float)
p_value = 1.0
null_mean = 0.0
null_sd = 0.0
effect_size = float(observed_slope) if np.isfinite(observed_slope) else 0.0
notes.append("Insufficient finite records or zero radial variance; permutation test not interpretable.")
else:
null_slopes = np.empty(N_PERM, dtype=float)
x_perm = x.copy()
for b in range(N_PERM):
# Shuffle centered radial scores within each cell; y and cell membership are fixed.
for idx in cell_groups:
x_perm[idx] = rng.permutation(x[idx])
null_slopes[b] = pooled_slope(x_perm, y)
# One-sided greater-than-zero/randomized-control p-value with plus-one correction.
p_value = float((1 + np.sum(null_slopes >= observed_slope)) / (N_PERM + 1))
null_mean = float(np.mean(null_slopes))
null_sd = float(np.std(null_slopes, ddof=1))
effect_size = float(observed_slope - null_mean)
# A compact result row plus enough fields to reconstruct the test.
result_table = pd.DataFrame([
{
"idea_id": IDEA_ID,
"parameter": "h3k9me3_radial_slope",
"observed_statistic": float(observed_slope),
"effect_size": float(effect_size),
"p_value": float(p_value),
"test_method": f"within-cell n_rad_score permutation test ({N_PERM} permutations, one-sided)",
"null_mean": float(null_mean),
"null_sd": float(null_sd),
"n_permutations": int(N_PERM if hypothesis_test_status == "pass" else 0),
"n_rows": int(len(df)),
"n_selected_cells": int(df["cell_id"].nunique()),
"n_cell_types": int(df["cell_type"].nunique()),
"hypothesis_test_status": hypothesis_test_status,
}
])
result_table.to_csv(result_path, index=False)
analysis_summary = {
"idea_id": IDEA_ID,
"parameter_name": "h3k9me3_radial_slope",
"parameter_value": float(observed_slope),
"observed_statistic": float(observed_slope),
"effect_size": float(effect_size),
"p_value": float(p_value),
"test_method": f"within-cell n_rad_score permutation test ({N_PERM} permutations, one-sided)",
"null_hypothesis": "After centering within each cell, H3K9me3 has no stronger positive association with n_rad_score than within-cell shuffled radial assignments.",
"alternative_hypothesis": "After centering within each cell, H3K9me3 increases with n_rad_score more than expected from within-cell shuffled radial assignments.",
"hypothesis_test_status": hypothesis_test_status,
"n_rows": int(len(df)),
"n_selected_cells": int(df["cell_id"].nunique()),
"n_cell_types": int(df["cell_type"].nunique()),
"n_permutations": int(N_PERM if hypothesis_test_status == "pass" else 0),
"null_distribution_mean": float(null_mean),
"null_distribution_sd": float(null_sd),
"result_path": str(result_path),
"statistical_figure_path": str(figure_path),
"notes": notes,
}
# Statistical figure: observed group association and observed slope vs permutation null.
plt.rcParams.update({
"figure.facecolor": "white",
"axes.facecolor": "white",
"axes.edgecolor": "black",
"axes.labelsize": 10,
"axes.titlesize": 11,
"legend.fontsize": 8,
})
fig, axes = plt.subplots(1, 2, figsize=(11, 4.2), constrained_layout=True)
# Left: downsampled centered spot cloud with fitted common slope.
plot_df = df.sample(n=min(5000, len(df)), random_state=7) if len(df) > 5000 else df.copy()
colors = {"Bergmann": "#4C78A8", "Granule": "#F58518", "Purkinje": "#54A24B"}
for ctype, g in plot_df.groupby("cell_type", sort=True):
axes[0].scatter(g["x_centered"], g["y_centered"], s=5, alpha=0.25, label=f"{ctype} spots", color=colors.get(str(ctype), None), linewidths=0)
xlim = np.nanpercentile(plot_df["x_centered"], [1, 99])
xx = np.linspace(float(xlim[0]), float(xlim[1]), 100)
axes[0].plot(xx, observed_slope * xx, color="black", lw=2, label="pooled centered slope")
axes[0].axhline(0, color="0.6", lw=0.8)
axes[0].axvline(0, color="0.6", lw=0.8)
axes[0].set_xlabel("Cell-centered n_rad_score (a.u.)")
axes[0].set_ylabel("Cell-centered H3K9me3 (a.u.)")
axes[0].set_title("Within-cell centered spot association")
axes[0].legend(frameon=False, loc="best")
# Right: null distribution and observed statistic.
axes[1].hist(null_slopes, bins=35, color="#9ecae1", edgecolor="white", label="within-cell shuffled null")
axes[1].axvline(observed_slope, color="#d62728", lw=2.5, label="observed slope")
axes[1].axvline(null_mean, color="black", lw=1.5, ls="--", label="null mean")
axes[1].set_xlabel("Pooled centered slope (H3K9me3 / n_rad_score)")
axes[1].set_ylabel("Permutation count")
axes[1].set_title("Permutation evidence for radial enrichment")
annot = (
f"p = {p_value:.4g}\n"
f"effect = {effect_size:.4g}\n"
f"n spots = {len(df):,}; cells = {df['cell_id'].nunique()}\n"
f"test: {N_PERM} within-cell shuffles"
)
axes[1].text(0.98, 0.95, annot, transform=axes[1].transAxes, ha="right", va="top", fontsize=9,
bbox=dict(boxstyle="round,pad=0.35", facecolor="white", edgecolor="0.7"))
axes[1].legend(frameon=False, loc="upper left")
fig.suptitle("H3K9me3 radial enrichment versus within-cell shuffled radial assignments", y=1.03, fontsize=12)
fig.savefig(figure_path, dpi=180, bbox_inches="tight")
plt.show()
display(Image(filename=str(figure_path)))
print(json.dumps(analysis_summary, indent=2))
display(result_table)
print("Per-cell slopes (diagnostic):")
display(cell_slope_table)
<IPython.core.display.Image object>
{
"idea_id": "h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c",
"parameter_name": "h3k9me3_radial_slope",
"parameter_value": -0.6601191931569761,
"observed_statistic": -0.6601191931569761,
"effect_size": -0.6607831051958099,
"p_value": 1.0,
"test_method": "within-cell n_rad_score permutation test (500 permutations, one-sided)",
"null_hypothesis": "After centering within each cell, H3K9me3 has no stronger positive association with n_rad_score than within-cell shuffled radial assignments.",
"alternative_hypothesis": "After centering within each cell, H3K9me3 increases with n_rad_score more than expected from within-cell shuffled radial assignments.",
"hypothesis_test_status": "pass",
"n_rows": 56036,
"n_selected_cells": 9,
"n_cell_types": 3,
"n_permutations": 500,
"null_distribution_mean": 0.0006639120388337939,
"null_distribution_sd": 0.019987276190134726,
"result_path": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c_result.csv",
"statistical_figure_path": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c_statistical_summary.png",
"notes": []
}
idea_id ... hypothesis_test_status
0 h3k9me3-radial-enrichment-should-be-stronger-t... ... pass
[1 rows x 13 columns]
Per-cell slopes (diagnostic):
cell_id cell_type n_spots cell_centered_slope
0 1_0_116 Purkinje 11659 -0.265062
1 1_0_26 Purkinje 4225 -0.077697
2 1_0_34 Bergmann 3932 -1.321063
3 1_0_37 Purkinje 5238 -0.749600
4 1_0_42 Granule 4183 -0.925902
5 1_0_47 Granule 4682 -0.924566
6 1_0_61 Bergmann 11283 -0.855566
7 1_0_63 Bergmann 7614 -0.669087
8 1_0_69 Granule 3220 -0.561861
tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/notebooks/h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c.ipynb:173: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown "review = review_idea_against_schema(IDEA, schema) if schema is not None else None\n",
Runner verification summary¶
This scaffolded section is generated by U-Chrom. The notebook agent executes it after exploration, and the runner re-executes it during final verification.
checks = {check: 'not_run' for check in IDEA.validation_checks}
notes = []
checks.setdefault('statistical_hypothesis_test', 'not_run')
def _check_keys(prefix):
return [key for key in checks if key == prefix or key.startswith(prefix + ':')]
def _set_check(prefix, value):
keys = _check_keys(prefix)
if not keys:
checks[prefix] = value
return
for key in keys:
checks[key] = value
def _check_status(prefix):
values = [checks[key] for key in _check_keys(prefix)]
if not values:
return None
if 'fail' in values:
return 'fail'
if all(value == 'pass' for value in values):
return 'pass'
return values[0]
_set_check('required_fields_exist', 'pass' if review is not None and review.accepted else 'fail')
if _check_keys('cell_id_alignment'):
aligned = True
if cdata is not None and adata is not None and len(cdata.cells) == len(adata.obs_names):
aligned = list(map(str, cdata.cells.index)) == list(map(str, adata.obs_names))
_set_check('cell_id_alignment', 'pass' if aligned else 'fail')
if _check_keys('minimum_cell_count'):
n_cells = analysis_summary.get('n_selected_cells')
if n_cells is None and 'cell_type' in getattr(result_table, 'columns', []):
n_cells = len(result_table)
if n_cells is None:
n_cells = len(cdata.cells) if cdata is not None and getattr(cdata, 'n_cells', 0) else 0
_set_check('minimum_cell_count', 'pass' if n_cells >= 1 else 'fail')
if _check_keys('minimum_spot_or_trace_count'):
n_rows = analysis_summary.get('n_rows')
if n_rows is None:
n_rows = len(result_table) if result_table is not None else 0
_set_check('minimum_spot_or_trace_count', 'pass' if n_rows >= 1 else 'fail')
if _check_keys('finite_numeric_output'):
value = analysis_summary.get('parameter_value')
_set_check('finite_numeric_output', 'pass' if value is not None and np.isfinite(value) else 'fail')
if _check_keys('statistical_hypothesis_test'):
p_value = analysis_summary.get('p_value')
test_method = analysis_summary.get('test_method')
null_hypothesis = analysis_summary.get('null_hypothesis')
alternative_hypothesis = analysis_summary.get('alternative_hypothesis')
observed_statistic = analysis_summary.get('observed_statistic')
effect_size = analysis_summary.get('effect_size')
hypothesis_test_status = analysis_summary.get('hypothesis_test_status', 'pass')
try:
p_float = float(p_value)
except Exception:
p_float = np.nan
try:
stat_float = float(observed_statistic)
except Exception:
stat_float = np.nan
try:
effect_float = float(effect_size)
except Exception:
effect_float = np.nan
has_required_test = (
test_method is not None
and str(test_method).strip() != ''
and null_hypothesis is not None
and str(null_hypothesis).strip() != ''
and alternative_hypothesis is not None
and str(alternative_hypothesis).strip() != ''
and np.isfinite(p_float)
and 0.0 <= p_float <= 1.0
and np.isfinite(stat_float)
and np.isfinite(effect_float)
and hypothesis_test_status != 'insufficient_data'
)
if result_table is not None and hasattr(result_table, 'columns'):
has_required_test = has_required_test and 'p_value' in result_table.columns and 'test_method' in result_table.columns
else:
has_required_test = False
_set_check('statistical_hypothesis_test', 'pass' if has_required_test else 'fail')
if not has_required_test:
notes.append('statistical_hypothesis_test failed: analysis_summary must include null_hypothesis, alternative_hypothesis, test_method, observed_statistic, effect_size, finite p_value in [0,1], and result_table columns p_value/test_method')
if _check_keys('negative_control_or_permutation'):
test_method_text = str(analysis_summary.get('test_method', '')).lower()
summary_keys_text = ' '.join(str(key).lower() for key in analysis_summary.keys())
result_columns_text = ''
if result_table is not None and hasattr(result_table, 'columns'):
result_columns_text = ' '.join(str(col).lower() for col in result_table.columns)
control_text = ' '.join([test_method_text, summary_keys_text, result_columns_text])
has_control_or_permutation = any(
token in control_text
for token in ['permutation', 'randomization', 'shuffle', 'negative_control', 'null_distribution', 'control']
)
_set_check(
'negative_control_or_permutation',
'pass' if has_control_or_permutation else 'not_implemented',
)
for check in list(checks):
if checks[check] == 'not_run' and ('negative_control' in check or check.endswith('_control')):
checks[check] = 'not_implemented'
required_for_pass = ['required_fields_exist', 'minimum_cell_count', 'finite_numeric_output', 'statistical_hypothesis_test']
status = 'pass'
for check in required_for_pass:
if _check_status(check) == 'fail':
status = 'fail'
notes.append(f'{check} failed')
n_rows_for_status = analysis_summary.get('n_rows')
if n_rows_for_status is None:
n_rows_for_status = len(result_table) if result_table is not None else 0
if n_rows_for_status == 0:
status = 'fail'
notes.append('analysis produced no result rows')
verification = {
'idea_id': IDEA.idea_id,
'status': status,
'checks': checks,
'parameter_value': analysis_summary.get('parameter_value'),
'p_value': analysis_summary.get('p_value'),
'test_method': analysis_summary.get('test_method'),
'effect_size': analysis_summary.get('effect_size'),
'result_path': analysis_summary.get('result_path'),
'notes': notes + analysis_summary.get('notes', []),
}
print(json.dumps(verification, indent=2))
{
"idea_id": "h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c",
"status": "pass",
"checks": {
"required_fields_exist": "pass",
"minimum_cell_count": "pass",
"minimum_spot_or_trace_count": "pass",
"finite_numeric_output": "pass",
"statistical_hypothesis_test_with_p_value": "not_run",
"runtime_under_budget": "not_run",
"deterministic_rerun": "not_run",
"negative_control_or_permutation": "pass",
"statistical_hypothesis_test": "pass"
},
"parameter_value": -0.6601191931569761,
"p_value": 1.0,
"test_method": "within-cell n_rad_score permutation test (500 permutations, one-sided)",
"effect_size": -0.6607831051958099,
"result_path": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d3c8c_result.csv",
"notes": []
}
Final interpretation¶
Hypothesis. H3K9me3 signal is radially organized across chromatin spots, with a positive within-cell association to n_rad_score that is robust to within-cell radial-score permutation.
Exploration. The notebook operationalized the idea as h3k9me3_radial_slope: cell-centered linear slope of tracks.H3K9me3 as a function of tracks.n_rad_score across spots. using modalities if_tracks, cell_metadata in cell type(s) Granule, Bergmann, Purkinje. Required data fields checked: spots.cell_id, tracks.H3K9me3, tracks.n_rad_score, cells.cell_type.
Statistical evidence. U-Chrom runner status: Notebook verified. Test: within-cell n_rad_score permutation test (500 permutations, one-sided). Observed statistic: -0.6601; effect size: -0.6608; parameter value: -0.6601; p-value: 1.
Conclusion. Not supported (Opposite direction). The observed effect points opposite to the expected direction and does not provide statistical support in this subset.
What verification means. Notebook verified means the run passed schema/data checks, produced finite numeric output, and included an explicit p-value/effect-size hypothesis test. It does not mean the biological hypothesis is automatically correct.
Checks passed. deterministic_rerun, finite_numeric_output, minimum_cell_count, minimum_spot_or_trace_count, negative_control_or_permutation, required_fields_exist, runtime_under_budget, statistical_hypothesis_test.
Main caveat. deterministic_rerun parameter_value=-0.6601191931569761
Final interpretation¶
The field inspection found complete finite coverage for the required spot-level fields: 56,036 finite H3K9me3/radial-score pairs across 9 cells and all 3 expected cell types, with no missing cell-type assignments.
Hypothesis test. The main analysis fit a pooled cell-centered slope of H3K9me3 on n_rad_score and compared it to 500 reproducible within-cell radial-score shuffles. The observed slope was -0.6601 H3K9me3 a.u. per radial-score a.u., with effect size versus the shuffled null mean -0.6608 and one-sided positive-enrichment permutation p = 1.0. Thus, this dataset does not support the proposed positive H3K9me3 radial enrichment direction; the observed association is strongly negative relative to the positive alternative.
Visual QA. The statistical figure is non-blank and readable, with a left panel showing centered spot-level data and the fitted slope and a right panel showing the shuffled null distribution, observed statistic, p-value, effect size, sample size, and test method. No schematic was generated for this run.