Auto-discovery idea: Purkinje marker expression predicts chromosome-wide radial positioning¶

Rationale¶

Cell identity programs may be coupled to global nuclear architecture, with Purkinje-marker expression linked to radial repositioning of chromatin.

Data used¶

Use linked RNA expression for Pcp2, spot-level radial score, cell IDs, and cell type annotations.

Analysis sketch¶

For each cell, compute the mean chromatin n_rad_score across all finite spots. Correlate this cell-level radial summary with linked Pcp2 expression across the 9 matched cells.

Expected result¶

A significant association would suggest that Purkinje transcriptional identity is accompanied by systematic nuclear-position differences.

Validation checks¶

Verify RNA/chromatin cell alignment, required fields, enough cells, finite correlation, exact p-value or permutation p-value, runtime, deterministic rerun, and a negative control using permuted cell labels.

Graphical abstract¶

Scientific schematic for Purkinje marker expression predicts chromosome-wide radial positioning

Generated after notebook exploration with Pantheon file_manager.generate_image.

In [1]:
# Ensure relative scaffold paths resolve from the project workspace.
import os
os.chdir('/Users/weizexu/Projects/U-Chrom')
print('cwd:', os.getcwd())
cwd: /Users/weizexu/Projects/U-Chrom
In [2]:
from pathlib import Path
import json
import os
os.environ.setdefault('MPLBACKEND', 'Agg')
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('Agg', force=True)
import matplotlib.pyplot as plt
from uchrom import ChromData
from uchrom.auto_discovery import DiscoveryIdea, review_idea_against_schema

IDEA = DiscoveryIdea.from_dict({'idea_title': 'Purkinje marker expression predicts chromosome-wide radial positioning', 'biological_hypothesis': 'Cells with higher Purkinje marker Pcp2 expression have systematically shifted genome-wide radial chromatin positioning.', 'computable_parameter': 'Spearman rho across cells between linked_adata.X expression of linked_adata.var.Pcp2 and per-cell mean tracks.n_rad_score.', 'analysis_plan': 'Align cells.index, spots.cell_id, and linked_adata.obs_names. For each cell_id, average finite tracks.n_rad_score over all spots. Extract Pcp2 expression from linked_adata.X using linked_adata.var.Pcp2. Compute Spearman rho across cells and evaluate significance with an exact or permutation test by shuffling Pcp2 expression across cell IDs.', 'modalities': ['chromatin_tracing', 'if_tracks', 'cell_metadata', 'rna_expression'], 'idea_markdown': '### Rationale\nCell identity programs may be coupled to global nuclear architecture, with Purkinje-marker expression linked to radial repositioning of chromatin.\n\n### Data used\nUse linked RNA expression for `Pcp2`, spot-level radial score, cell IDs, and cell type annotations.\n\n### Analysis sketch\nFor each cell, compute the mean chromatin `n_rad_score` across all finite spots. Correlate this cell-level radial summary with linked `Pcp2` expression across the 9 matched cells.\n\n### Expected result\nA significant association would suggest that Purkinje transcriptional identity is accompanied by systematic nuclear-position differences.\n\n### Validation checks\nVerify RNA/chromatin cell alignment, required fields, enough cells, finite correlation, exact p-value or permutation p-value, runtime, deterministic rerun, and a negative control using permuted cell labels.', 'cell_types': ['Granule', 'Bergmann', 'Purkinje'], 'required_fields': ['spots.cell_id', 'tracks.n_rad_score', 'cells.cell_type', 'linked_adata.X', 'linked_adata.var.Pcp2'], 'validation_checks': ['required_fields_exist', 'minimum_cell_count_at_least_9_total', 'minimum_spot_or_trace_count_at_least_1000_finite_spots', 'finite_numeric_output', 'statistical_hypothesis_test_spearman_exact_or_permutation_p_value', 'runtime_under_budget_5_minutes', 'deterministic_rerun_fixed_seed', 'negative_control_or_permutation_shuffle_Pcp2_across_cells'], 'expected_direction': 'A nonzero rho; direction indicates whether Pcp2-high Purkinje-like cells have more interior or more peripheral average chromatin radial scores.', 'complexity': 3, 'idea_id': 'purkinje-marker-expression-predicts-chromosome-w-ed79327c32', 'metadata': {}})
H5CD_PATH = 'tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd'
RUN_OUTPUT_DIR = Path('tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg')
RUN_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
cdata = ChromData.read(H5CD_PATH) if H5CD_PATH else None
schema = cdata.discovery_schema if cdata is not None else None
adata = cdata.linked_adata if cdata is not None else None
print(IDEA.idea_id)
if cdata is not None:
    print(cdata)
    print(cdata.describe_for_agent(max_items=20))
purkinje-marker-expression-predicts-chromosome-w-ed79327c32
ChromData: n_spots=56036, n_traces=213, n_cells=9
  spots:   ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
  cells:   ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (9 cells)
  cellm:   {'umap': (9, 2)}
  tracks:  ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
  traces:  ['dbscan_allele', 'dbscan_ldp_allele'] (213 traces)
  uns:     ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'auto_discovery_schema', 'leiden_to_cell_type', 'linked_anndata']
  linked_adata: (9, 60)
# ChromData discovery schema

dataset: takei2025_doc_subset_pantheon_20
genome: mm10
xyz_unit: um
shape: 56036 spots, 213 traces, 9 cells

modalities:
- cell_metadata: present; operations: cell_type_stratification, embedding_visualization
- chromatin_tracing: present; operations: chromosome_subset, cell_subset, trace_subset, pairwise_3d_distance, intra_chromatin_distance, inter_chromatin_distance
- if_tracks: present; operations: marker_high_low_bin_selection, marker_stratified_distance, per_cell_marker_summary, per_cell_type_marker_summary
- rna_expression: present; operations: gene_expression_lookup, expression_stratification, gene_marker_correlation, chromatin_expression_association

chroms: 20 [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrX]
cell_types: 3 [Bergmann=3, Granule=3, Purkinje=3]
tracks: 62 [CPSF6, ATRX, H4K8ac, HDAC2, H3K9ac, H3K9me3, H3K9me2, RNAPIISer2-P, H3, H3K36me2, UBTF, LaminB1, RNAPIISer5-P, RYBP, HP1beta, RING1B, H2A.X, H3K4me1, H4K20me2, H3K27me2 ...]
linked_adata: shape=[9, 60], X=csr_matrix
genes: 60 [Aldoc, Calb1, Cdh22, Drd3, Eomes, Ephb2, Foxj1, Gabra6, Gpr176, Grm1, Hspb1, Mrc1, Nefh, Npas3, Nptn, Olig1, Pcp2, Pcp4, Plcb3, Plcb4 ...]

known_missing:
- cellm['if_mean'] per-cell IF mean matrix
- raw RNA seqFISH spot geometry as a first-class ChromData component
- scRNA reference matrix for external expression comparison
- gene annotation cache for gene-neighborhood analyses

verification_required:
- required_fields_exist
- minimum_cell_count
- minimum_spot_or_trace_count
- finite_numeric_output
- statistical_hypothesis_test
- runtime_under_budget
- deterministic_rerun
- negative_control_or_permutation
- redundancy_against_existing_parameters

Required data checks¶

In [3]:
review = review_idea_against_schema(IDEA, schema) if schema is not None else None
print(None if review is None else review.to_dict())
assert review is None or review.accepted, review.to_dict()
{'accepted': True, 'errors': [], 'warnings': ['multi-modal idea should include a cell_id_alignment validation check'], 'missing_fields': []}

Exploration¶

The code agent can freely add cells below this point.

Critique and analysis plan¶

This idea is directly computable but has only 9 matched cells, so the inferential test must be treated as exploratory. I will first verify alignment among cells, spot-level cell_id, and linked_adata.obs_names, then summarize finite tracks['n_rad_score'] per cell. The main test will use Spearman correlation between per-cell mean radial score and Pcp2 expression with an exact all-permutations label-shuffle null (9! permutations is still lightweight), which also serves as the requested negative/control distribution. The resulting figure will show the cell-level association and the observed statistic against the permutation null.

In [4]:
# Lightweight data inspection: alignment and finite radial-score coverage.
import numpy as np
import pandas as pd

print('cells index:', list(cdata.cells.index))
print('linked_adata obs_names:', list(adata.obs_names))
print('Pcp2 in linked_adata.var_names:', 'Pcp2' in list(adata.var_names))
print('spot columns:', list(cdata.spots.columns)[:10])
print('available track keys include n_rad_score:', 'n_rad_score' in cdata.tracks)

rad = np.asarray(cdata.tracks['n_rad_score'], dtype=float)
spot_cell = pd.Series(cdata.spots['cell_id']).astype(str)
coverage = pd.DataFrame({'cell_id': spot_cell, 'n_rad_score': rad}).groupby('cell_id').agg(
    finite_spots=('n_rad_score', lambda x: int(np.isfinite(x).sum())),
    total_spots=('n_rad_score', 'size'),
    mean_n_rad_score=('n_rad_score', lambda x: float(np.nanmean(np.asarray(x, dtype=float))))
)
coverage['finite_fraction'] = coverage['finite_spots'] / coverage['total_spots']
print('finite n_rad_score spots:', int(np.isfinite(rad).sum()), 'of', rad.size)
display(coverage.join(cdata.cells[['cell_type']], how='left'))
cells index: ['1_0_42', '1_0_47', '1_0_69', '1_0_34', '1_0_61', '1_0_63', '1_0_26', '1_0_37', '1_0_116']
linked_adata obs_names: ['1_0_42', '1_0_47', '1_0_69', '1_0_34', '1_0_61', '1_0_63', '1_0_26', '1_0_37', '1_0_116']
Pcp2 in linked_adata.var_names: True
spot columns: ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
available track keys include n_rad_score: True
finite n_rad_score spots: 56036 of 56036
         finite_spots  total_spots  mean_n_rad_score  finite_fraction cell_type
cell_id                                                                        
1_0_116         11659        11659          0.716611              1.0  Purkinje
1_0_26           4225         4225          0.698348              1.0  Purkinje
1_0_34           3932         3932          0.751676              1.0  Bergmann
1_0_37           5238         5238          0.722048              1.0  Purkinje
1_0_42           4183         4183          0.752533              1.0   Granule
1_0_47           4682         4682          0.745255              1.0   Granule
1_0_61          11283        11283          0.737418              1.0  Bergmann
1_0_63           7614         7614          0.725035              1.0  Bergmann
1_0_69           3220         3220          0.751632              1.0   Granule
In [5]:
# Main exploration: test whether Pcp2 expression predicts per-cell mean radial chromatin score.
import os
os.environ.setdefault("MPLBACKEND", "Agg")
import matplotlib
matplotlib.use("Agg", force=True)
import matplotlib.pyplot as plt
from pathlib import Path
import itertools
import json
import numpy as np
import pandas as pd
from IPython.display import display, Image

rng = np.random.default_rng(20250609)
result_path = RUN_OUTPUT_DIR / 'purkinje-marker-expression-predicts-chromosome-w-ed79327c32_result.csv'
figure_path = RUN_OUTPUT_DIR / 'purkinje-marker-expression-predicts-chromosome-w-ed79327c32_statistical_summary.png'

# Extract per-cell chromatin radial summary.
radial = np.asarray(cdata.tracks['n_rad_score'], dtype=float)
spot_cells = pd.Series(cdata.spots['cell_id']).astype(str).to_numpy()
radial_df = pd.DataFrame({'cell_id': spot_cells, 'n_rad_score': radial})
cell_radial = radial_df.groupby('cell_id', sort=False).agg(
    mean_n_rad_score=('n_rad_score', lambda x: float(np.nanmean(np.asarray(x, dtype=float)))),
    median_n_rad_score=('n_rad_score', lambda x: float(np.nanmedian(np.asarray(x, dtype=float)))),
    finite_spots=('n_rad_score', lambda x: int(np.isfinite(np.asarray(x, dtype=float)).sum())),
    total_spots=('n_rad_score', 'size'),
).reset_index()
cell_radial['finite_fraction'] = cell_radial['finite_spots'] / cell_radial['total_spots']

# Extract linked Pcp2 expression and align by cell_id.
if 'Pcp2' not in list(adata.var_names):
    raise KeyError('Pcp2 not found in linked_adata.var_names')
pcp2_idx = list(adata.var_names).index('Pcp2')
X_col = adata.X[:, pcp2_idx]
if hasattr(X_col, 'toarray'):
    pcp2_expr = np.asarray(X_col.toarray()).ravel().astype(float)
else:
    pcp2_expr = np.asarray(X_col).ravel().astype(float)
expr_df = pd.DataFrame({'cell_id': list(map(str, adata.obs_names)), 'Pcp2_expression': pcp2_expr})
cell_meta = cdata.cells[['cell_type']].copy().reset_index().rename(columns={'index': 'cell_id'})
cell_meta['cell_id'] = cell_meta['cell_id'].astype(str)

result_table = (cell_meta.merge(cell_radial, on='cell_id', how='left')
                .merge(expr_df, on='cell_id', how='left'))
result_table['aligned_to_linked_adata_order'] = result_table['cell_id'].tolist() == list(map(str, adata.obs_names))
valid_mask = np.isfinite(result_table['Pcp2_expression'].to_numpy()) & np.isfinite(result_table['mean_n_rad_score'].to_numpy())
valid = result_table.loc[valid_mask].copy()
n_cells = int(len(valid))
n_finite_spots = int(valid['finite_spots'].sum()) if n_cells else 0

null_hypothesis = 'Across matched cells, Pcp2 expression is exchangeable with respect to per-cell mean n_rad_score; Spearman rho equals 0.'
alternative_hypothesis = 'Pcp2 expression is monotonically associated with per-cell mean n_rad_score (two-sided nonzero Spearman rho).'
test_method = 'Spearman rank correlation with 1000 reproducible cell-label permutations'

def spearman_rho(x, y):
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    if len(x) < 2 or np.nanstd(x) == 0 or np.nanstd(y) == 0:
        return np.nan
    xr = pd.Series(x).rank(method='average').to_numpy(dtype=float)
    yr = pd.Series(y).rank(method='average').to_numpy(dtype=float)
    return float(np.corrcoef(xr, yr)[0, 1])

if n_cells >= 3 and np.nanstd(valid['Pcp2_expression']) > 0 and np.nanstd(valid['mean_n_rad_score']) > 0:
    x = valid['Pcp2_expression'].to_numpy(dtype=float)
    y = valid['mean_n_rad_score'].to_numpy(dtype=float)
    observed_rho = spearman_rho(x, y)
    n_permutations = 1000
    null_rhos = np.empty(n_permutations, dtype=float)
    for i in range(n_permutations):
        null_rhos[i] = spearman_rho(rng.permutation(x), y)
    p_value = float((np.sum(np.abs(null_rhos) >= abs(observed_rho)) + 1) / (n_permutations + 1))
    effect_size = float(observed_rho)
    hypothesis_test_status = 'pass'
    test_note = 'Permutation null generated by shuffling Pcp2 expression among the 9 fixed chromatin cells.'
else:
    x = valid['Pcp2_expression'].to_numpy(dtype=float) if n_cells else np.array([])
    y = valid['mean_n_rad_score'].to_numpy(dtype=float) if n_cells else np.array([])
    observed_rho = spearman_rho(x, y) if n_cells >= 2 else 0.0
    if not np.isfinite(observed_rho):
        observed_rho = 0.0
    null_rhos = np.array([0.0])
    p_value = 1.0
    effect_size = float(observed_rho)
    hypothesis_test_status = 'insufficient_data'
    n_permutations = 0
    test_note = 'Insufficient nonconstant finite cell-level values for permutation testing.'

result_table['observed_statistic'] = float(observed_rho)
result_table['effect_size'] = float(effect_size)
result_table['p_value'] = float(p_value)
result_table['test_method'] = test_method
result_table['hypothesis_test_status'] = hypothesis_test_status
result_table['null_hypothesis'] = null_hypothesis
result_table['alternative_hypothesis'] = alternative_hypothesis
result_table.to_csv(result_path, index=False)

analysis_summary = {
    'idea_id': IDEA.idea_id,
    'result_path': str(result_path),
    'statistical_figure_path': str(figure_path),
    'n_selected_cells': n_cells,
    'n_rows': n_finite_spots,
    'n_permutations': int(n_permutations),
    'parameter_value': float(observed_rho),
    'observed_statistic': float(observed_rho),
    'effect_size': float(effect_size),
    'p_value': float(p_value),
    'test_method': test_method,
    'null_hypothesis': null_hypothesis,
    'alternative_hypothesis': alternative_hypothesis,
    'hypothesis_test_status': hypothesis_test_status,
    'finite_spots_total': n_finite_spots,
    'cell_id_alignment_exact': bool(list(map(str, cdata.cells.index)) == list(map(str, adata.obs_names))),
    'notes': [test_note, 'Effect size is Spearman rho; radial score is the per-cell mean across all finite spots.']
}

# Statistical figure: cell-level association plus permutation null distribution.
plt.close('all')
fig, axes = plt.subplots(1, 2, figsize=(10.5, 4.2), facecolor='white')
colors = {'Granule': '#4C78A8', 'Bergmann': '#F58518', 'Purkinje': '#54A24B'}
for ct, sub in valid.groupby('cell_type'):
    axes[0].scatter(sub['Pcp2_expression'], sub['mean_n_rad_score'], s=70,
                    label=f'{ct} (n={len(sub)})', color=colors.get(ct, '#777777'),
                    edgecolor='black', linewidth=0.5, alpha=0.95)
if n_cells >= 2 and np.nanstd(x) > 0:
    coef = np.polyfit(x, y, deg=1)
    xx = np.linspace(float(np.min(x)), float(np.max(x)), 100)
    axes[0].plot(xx, coef[0] * xx + coef[1], color='black', lw=1.3, label='linear guide')
axes[0].set_xlabel('Linked RNA Pcp2 expression (a.u.)')
axes[0].set_ylabel('Mean spot n_rad_score per cell (unitless)')
axes[0].set_title('Cell-level Pcp2 vs radial chromatin score')
axes[0].legend(frameon=False, fontsize=8)
axes[0].grid(True, alpha=0.25)

bins = np.linspace(-1, 1, 31)
axes[1].hist(null_rhos, bins=bins, color='#B0B0B0', edgecolor='white', label='shuffled-label null')
axes[1].axvline(observed_rho, color='#D62728', lw=2.5, label=f'observed rho={observed_rho:.3f}')
axes[1].axvline(-abs(observed_rho), color='#D62728', lw=1.2, ls='--', alpha=0.7)
axes[1].axvline(abs(observed_rho), color='#D62728', lw=1.2, ls='--', alpha=0.7)
axes[1].set_xlabel('Spearman rho after Pcp2 label shuffle')
axes[1].set_ylabel('Permutation count')
axes[1].set_title('Permutation test evidence')
axes[1].legend(frameon=False, fontsize=8)
axes[1].grid(True, alpha=0.25)
annotation = f"two-sided p={p_value:.4f}\neffect size rho={effect_size:.3f}\nn={n_cells} cells; {n_finite_spots:,} spots\n{test_method}"
axes[1].text(0.03, 0.97, annotation, transform=axes[1].transAxes, va='top', ha='left',
             fontsize=8.5, bbox=dict(boxstyle='round,pad=0.3', facecolor='white', edgecolor='#CCCCCC', alpha=0.95))
fig.suptitle('Purkinje marker expression and chromosome-wide radial positioning', y=1.02, fontsize=12)
fig.tight_layout()
fig.savefig(figure_path, dpi=180, bbox_inches='tight')
display(fig)
plt.close(fig)
display(Image(filename=str(figure_path)))

print(json.dumps(analysis_summary, indent=2))
display(result_table)
Figure(1050x420)
<IPython.core.display.Image object>
{
  "idea_id": "purkinje-marker-expression-predicts-chromosome-w-ed79327c32",
  "result_path": "tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/purkinje-marker-expression-predicts-chromosome-w-ed79327c32_result.csv",
  "statistical_figure_path": "tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/purkinje-marker-expression-predicts-chromosome-w-ed79327c32_statistical_summary.png",
  "n_selected_cells": 9,
  "n_rows": 56036,
  "n_permutations": 1000,
  "parameter_value": -0.8619322310183947,
  "observed_statistic": -0.8619322310183947,
  "effect_size": -0.8619322310183947,
  "p_value": 0.001998001998001998,
  "test_method": "Spearman rank correlation with 1000 reproducible cell-label permutations",
  "null_hypothesis": "Across matched cells, Pcp2 expression is exchangeable with respect to per-cell mean n_rad_score; Spearman rho equals 0.",
  "alternative_hypothesis": "Pcp2 expression is monotonically associated with per-cell mean n_rad_score (two-sided nonzero Spearman rho).",
  "hypothesis_test_status": "pass",
  "finite_spots_total": 56036,
  "cell_id_alignment_exact": true,
  "notes": [
    "Permutation null generated by shuffling Pcp2 expression among the 9 fixed chromatin cells.",
    "Effect size is Spearman rho; radial score is the per-cell mean across all finite spots."
  ]
}
   cell_id  ...                             alternative_hypothesis
0   1_0_42  ...  Pcp2 expression is monotonically associated wi...
1   1_0_47  ...  Pcp2 expression is monotonically associated wi...
2   1_0_69  ...  Pcp2 expression is monotonically associated wi...
3   1_0_34  ...  Pcp2 expression is monotonically associated wi...
4   1_0_61  ...  Pcp2 expression is monotonically associated wi...
5   1_0_63  ...  Pcp2 expression is monotonically associated wi...
6   1_0_26  ...  Pcp2 expression is monotonically associated wi...
7   1_0_37  ...  Pcp2 expression is monotonically associated wi...
8  1_0_116  ...  Pcp2 expression is monotonically associated wi...

[9 rows x 16 columns]

Statistical figure¶

Statistical figure for Purkinje marker expression predicts chromosome-wide radial positioning

Agent-generated quantitative figure saved during exploration.

In [6]:
# Ensure required relative output paths exist from the workspace filesystem for runner/QA tools.
from pathlib import Path
import shutil, os
workspace_dir = Path('/Users/weizexu/Projects/U-Chrom')
required_dir = workspace_dir / 'tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg'
required_dir.mkdir(parents=True, exist_ok=True)
required_figure = required_dir / 'purkinje-marker-expression-predicts-chromosome-w-ed79327c32_statistical_summary.png'
required_result = required_dir / 'purkinje-marker-expression-predicts-chromosome-w-ed79327c32_result.csv'
# If previous relative paths landed elsewhere, copy them into the required workspace-relative location.
for candidate in [Path('purkinje-marker-expression-predicts-chromosome-w-ed79327c32_statistical_summary.png'), Path('/Users/weizexu/Projects/U-Chrom/purkinje-marker-expression-predicts-chromosome-w-ed79327c32_statistical_summary.png'), Path(analysis_summary['statistical_figure_path'])]:
    if candidate.exists() and candidate.resolve() != required_figure.resolve():
        shutil.copy2(candidate, required_figure)
        break
if not required_figure.exists() and Path(analysis_summary['statistical_figure_path']).exists():
    shutil.copy2(Path(analysis_summary['statistical_figure_path']), required_figure)
if not required_result.exists() and Path(analysis_summary['result_path']).exists():
    shutil.copy2(Path(analysis_summary['result_path']), required_result)
analysis_summary['statistical_figure_path'] = str(required_figure.relative_to(workspace_dir))
analysis_summary['result_path'] = str(required_result.relative_to(workspace_dir))
print('cwd', os.getcwd())
print('figure exists', required_figure.exists(), required_figure, required_figure.stat().st_size if required_figure.exists() else None)
print('result exists', required_result.exists(), required_result, required_result.stat().st_size if required_result.exists() else None)
print(analysis_summary['statistical_figure_path'])
cwd /Users/weizexu/Projects/U-Chrom
figure exists True /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/purkinje-marker-expression-predicts-chromosome-w-ed79327c32_statistical_summary.png 129669
result exists True /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/purkinje-marker-expression-predicts-chromosome-w-ed79327c32_result.csv 4283
tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/purkinje-marker-expression-predicts-chromosome-w-ed79327c32_statistical_summary.png

Runner verification summary¶

This scaffolded section is generated by U-Chrom. The notebook agent executes it after exploration, and the runner re-executes it during final verification.

In [7]:
checks = {check: 'not_run' for check in IDEA.validation_checks}
notes = []
checks.setdefault('statistical_hypothesis_test', 'not_run')

def _check_keys(prefix):
    return [key for key in checks if key == prefix or key.startswith(prefix + ':')]

def _set_check(prefix, value):
    keys = _check_keys(prefix)
    if not keys:
        checks[prefix] = value
        return
    for key in keys:
        checks[key] = value

def _check_status(prefix):
    values = [checks[key] for key in _check_keys(prefix)]
    if not values:
        return None
    if 'fail' in values:
        return 'fail'
    if all(value == 'pass' for value in values):
        return 'pass'
    return values[0]

_set_check('required_fields_exist', 'pass' if review is not None and review.accepted else 'fail')
if _check_keys('cell_id_alignment'):
    aligned = True
    if cdata is not None and adata is not None and len(cdata.cells) == len(adata.obs_names):
        aligned = list(map(str, cdata.cells.index)) == list(map(str, adata.obs_names))
    _set_check('cell_id_alignment', 'pass' if aligned else 'fail')
if _check_keys('minimum_cell_count'):
    n_cells = analysis_summary.get('n_selected_cells')
    if n_cells is None and 'cell_type' in getattr(result_table, 'columns', []):
        n_cells = len(result_table)
    if n_cells is None:
        n_cells = len(cdata.cells) if cdata is not None and getattr(cdata, 'n_cells', 0) else 0
    _set_check('minimum_cell_count', 'pass' if n_cells >= 1 else 'fail')
if _check_keys('minimum_spot_or_trace_count'):
    n_rows = analysis_summary.get('n_rows')
    if n_rows is None:
        n_rows = len(result_table) if result_table is not None else 0
    _set_check('minimum_spot_or_trace_count', 'pass' if n_rows >= 1 else 'fail')
if _check_keys('finite_numeric_output'):
    value = analysis_summary.get('parameter_value')
    _set_check('finite_numeric_output', 'pass' if value is not None and np.isfinite(value) else 'fail')
if _check_keys('statistical_hypothesis_test'):
    p_value = analysis_summary.get('p_value')
    test_method = analysis_summary.get('test_method')
    null_hypothesis = analysis_summary.get('null_hypothesis')
    alternative_hypothesis = analysis_summary.get('alternative_hypothesis')
    observed_statistic = analysis_summary.get('observed_statistic')
    effect_size = analysis_summary.get('effect_size')
    hypothesis_test_status = analysis_summary.get('hypothesis_test_status', 'pass')
    try:
        p_float = float(p_value)
    except Exception:
        p_float = np.nan
    try:
        stat_float = float(observed_statistic)
    except Exception:
        stat_float = np.nan
    try:
        effect_float = float(effect_size)
    except Exception:
        effect_float = np.nan
    has_required_test = (
        test_method is not None
        and str(test_method).strip() != ''
        and null_hypothesis is not None
        and str(null_hypothesis).strip() != ''
        and alternative_hypothesis is not None
        and str(alternative_hypothesis).strip() != ''
        and np.isfinite(p_float)
        and 0.0 <= p_float <= 1.0
        and np.isfinite(stat_float)
        and np.isfinite(effect_float)
        and hypothesis_test_status != 'insufficient_data'
    )
    if result_table is not None and hasattr(result_table, 'columns'):
        has_required_test = has_required_test and 'p_value' in result_table.columns and 'test_method' in result_table.columns
    else:
        has_required_test = False
    _set_check('statistical_hypothesis_test', 'pass' if has_required_test else 'fail')
    if not has_required_test:
        notes.append('statistical_hypothesis_test failed: analysis_summary must include null_hypothesis, alternative_hypothesis, test_method, observed_statistic, effect_size, finite p_value in [0,1], and result_table columns p_value/test_method')
if _check_keys('negative_control_or_permutation'):
    test_method_text = str(analysis_summary.get('test_method', '')).lower()
    summary_keys_text = ' '.join(str(key).lower() for key in analysis_summary.keys())
    result_columns_text = ''
    if result_table is not None and hasattr(result_table, 'columns'):
        result_columns_text = ' '.join(str(col).lower() for col in result_table.columns)
    control_text = ' '.join([test_method_text, summary_keys_text, result_columns_text])
    has_control_or_permutation = any(
        token in control_text
        for token in ['permutation', 'randomization', 'shuffle', 'negative_control', 'null_distribution', 'control']
    )
    _set_check(
        'negative_control_or_permutation',
        'pass' if has_control_or_permutation else 'not_implemented',
    )
for check in list(checks):
    if checks[check] == 'not_run' and ('negative_control' in check or check.endswith('_control')):
        checks[check] = 'not_implemented'

required_for_pass = ['required_fields_exist', 'minimum_cell_count', 'finite_numeric_output', 'statistical_hypothesis_test']
status = 'pass'
for check in required_for_pass:
    if _check_status(check) == 'fail':
        status = 'fail'
        notes.append(f'{check} failed')
n_rows_for_status = analysis_summary.get('n_rows')
if n_rows_for_status is None:
    n_rows_for_status = len(result_table) if result_table is not None else 0
if n_rows_for_status == 0:
    status = 'fail'
    notes.append('analysis produced no result rows')

verification = {
    'idea_id': IDEA.idea_id,
    'status': status,
    'checks': checks,
    'parameter_value': analysis_summary.get('parameter_value'),
    'p_value': analysis_summary.get('p_value'),
    'test_method': analysis_summary.get('test_method'),
    'effect_size': analysis_summary.get('effect_size'),
    'result_path': analysis_summary.get('result_path'),
    'notes': notes + analysis_summary.get('notes', []),
}
print(json.dumps(verification, indent=2))
{
  "idea_id": "purkinje-marker-expression-predicts-chromosome-w-ed79327c32",
  "status": "pass",
  "checks": {
    "required_fields_exist": "pass",
    "minimum_cell_count_at_least_9_total": "not_run",
    "minimum_spot_or_trace_count_at_least_1000_finite_spots": "not_run",
    "finite_numeric_output": "pass",
    "statistical_hypothesis_test_spearman_exact_or_permutation_p_value": "not_run",
    "runtime_under_budget_5_minutes": "not_run",
    "deterministic_rerun_fixed_seed": "not_run",
    "negative_control_or_permutation_shuffle_Pcp2_across_cells": "not_implemented",
    "statistical_hypothesis_test": "pass"
  },
  "parameter_value": -0.8619322310183947,
  "p_value": 0.001998001998001998,
  "test_method": "Spearman rank correlation with 1000 reproducible cell-label permutations",
  "effect_size": -0.8619322310183947,
  "result_path": "tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/purkinje-marker-expression-predicts-chromosome-w-ed79327c32_result.csv",
  "notes": [
    "Permutation null generated by shuffling Pcp2 expression among the 9 fixed chromatin cells.",
    "Effect size is Spearman rho; radial score is the per-cell mean across all finite spots."
  ]
}

Final interpretation¶

Hypothesis. Cells with higher Purkinje marker Pcp2 expression have systematically shifted genome-wide radial chromatin positioning.

Exploration. The notebook operationalized the idea as Spearman rho across cells between linked_adata.X expression of linked_adata.var.Pcp2 and per-cell mean tracks.n_rad_score. using modalities chromatin_tracing, if_tracks, cell_metadata, rna_expression in cell type(s) Granule, Bergmann, Purkinje. Required data fields checked: spots.cell_id, tracks.n_rad_score, cells.cell_type, linked_adata.X, linked_adata.var.Pcp2.

Statistical evidence. U-Chrom runner status: Notebook verified. Test: Spearman rank correlation with 1000 reproducible cell-label permutations. Observed statistic: -0.8619; effect size: -0.8619; parameter value: -0.8619; p-value: 0.001998.

Conclusion. Contradicted (Opposite direction). The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.

What verification means. Notebook verified means the run passed schema/data checks, produced finite numeric output, and included an explicit p-value/effect-size hypothesis test. It does not mean the biological hypothesis is automatically correct.

Checks passed. finite_numeric_output, required_fields_exist, statistical_hypothesis_test.

Main caveat. Permutation null generated by shuffling Pcp2 expression among the 9 fixed chromatin cells.

Final interpretation¶

The matched RNA/chromatin alignment was exact for the 9 cells, and all 56,036 spot-level n_rad_score values were finite. Per-cell mean radial scores were compared to linked Pcp2 RNA expression.

Hypothesis test. A two-sided Spearman label-shuffle test (1,000 reproducible permutations of Pcp2 across cells) gave observed rho = -0.862 with p = 0.001998. The effect size is the Spearman rho, indicating that higher Pcp2 expression is associated with lower mean chromosome-wide radial score in this small 9-cell subset. This is an exploratory result because the number of independent cells is small.

Visual QA. The saved statistical figure is non-blank and readable: it shows the cell-level Pcp2/radial-score comparison by cell type and the permutation null distribution with the observed statistic, p-value, effect size, sample size, and test method annotated. No schematic image was generated for this run.