Auto-discovery idea: Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls¶

Rationale¶

Purkinje-associated transcriptional state may be reflected in active chromatin signal. The robustness question is whether the RNA-to-IF relationship exceeds what is expected after breaking cell alignment.

Data used¶

Use linked RNA expression for Pcp2, per-spot H3K27ac, spot cell IDs, and cell type metadata across all nine cells.

Analysis sketch¶

Compute each cell’s median H3K27ac signal and compare it with that cell’s Pcp2 expression from linked_adata.X. The single parameter is the Spearman correlation between these two per-cell quantities.

Expected result¶

If Pcp2-high cells carry stronger active chromatin signal, the correlation should be positive and larger than shuffled-cell controls.

Validation checks¶

Confirm exact fields, at least the available cell counts by type, finite correlation, a permutation p-value, deterministic rerun, runtime budget, and a negative control that shuffles Pcp2 expression across cells.

Graphical abstract¶

Scientific schematic for Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls

Generated after notebook exploration with Pantheon file_manager.generate_image.

In [1]:
# Normalize working directory for relative paths used by scaffold cells.
from pathlib import Path
import os
root = Path('/Users/weizexu/Projects/U-Chrom')
os.chdir(root)
print('cwd:', Path.cwd())
cwd: /Users/weizexu/Projects/U-Chrom
In [2]:
from pathlib import Path
import json
import os
os.environ.setdefault('MPLBACKEND', 'Agg')
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('Agg', force=True)
import matplotlib.pyplot as plt
from uchrom import ChromData
from uchrom.auto_discovery import DiscoveryIdea, review_idea_against_schema

IDEA = DiscoveryIdea.from_dict({'idea_title': 'Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls', 'biological_hypothesis': 'Cells with higher Pcp2 expression have higher per-cell H3K27ac signal, and this cell-level RNA-chromatin association is lost when cell identities are permuted.', 'computable_parameter': 'pcp2_h3k27ac_spearman_rho: Spearman correlation across cells between linked_adata Pcp2 expression and per-cell median tracks.H3K27ac.', 'analysis_plan': 'Aggregate tracks.H3K27ac by spots.cell_id to obtain one median H3K27ac value per cell. Extract Pcp2 expression from linked_adata.X using linked_adata.var.Pcp2 and align observations to cells by cell_id. Compute Spearman rho across cells and test against a null distribution from seeded permutations of Pcp2 values across cell IDs.', 'modalities': ['if_tracks', 'cell_metadata', 'rna_expression'], 'idea_markdown': '### Rationale\nPurkinje-associated transcriptional state may be reflected in active chromatin signal. The robustness question is whether the RNA-to-IF relationship exceeds what is expected after breaking cell alignment.\n\n### Data used\nUse linked RNA expression for Pcp2, per-spot H3K27ac, spot cell IDs, and cell type metadata across all nine cells.\n\n### Analysis sketch\nCompute each cell’s median H3K27ac signal and compare it with that cell’s Pcp2 expression from linked_adata.X. The single parameter is the Spearman correlation between these two per-cell quantities.\n\n### Expected result\nIf Pcp2-high cells carry stronger active chromatin signal, the correlation should be positive and larger than shuffled-cell controls.\n\n### Validation checks\nConfirm exact fields, at least the available cell counts by type, finite correlation, a permutation p-value, deterministic rerun, runtime budget, and a negative control that shuffles Pcp2 expression across cells.', 'cell_types': ['Granule', 'Bergmann', 'Purkinje'], 'required_fields': ['spots.cell_id', 'tracks.H3K27ac', 'cells.cell_type', 'linked_adata.X', 'linked_adata.var.Pcp2'], 'validation_checks': ['required_fields_exist', 'minimum_cell_count', 'minimum_spot_or_trace_count', 'finite_numeric_output', 'statistical_hypothesis_test_with_p_value', 'runtime_under_budget', 'deterministic_rerun', 'negative_control_or_permutation'], 'expected_direction': 'pcp2_h3k27ac_spearman_rho should be positive and greater than the shuffled-cell permutation null.', 'complexity': 3, 'idea_id': 'pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5', 'metadata': {}})
H5CD_PATH = 'tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd'
RUN_OUTPUT_DIR = Path('tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg')
RUN_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
cdata = ChromData.read(H5CD_PATH) if H5CD_PATH else None
schema = cdata.discovery_schema if cdata is not None else None
adata = cdata.linked_adata if cdata is not None else None
print(IDEA.idea_id)
if cdata is not None:
    print(cdata)
    print(cdata.describe_for_agent(max_items=20))
pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5
ChromData: n_spots=56036, n_traces=213, n_cells=9
  spots:   ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
  cells:   ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (9 cells)
  cellm:   {'umap': (9, 2)}
  tracks:  ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
  traces:  ['dbscan_allele', 'dbscan_ldp_allele'] (213 traces)
  uns:     ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'auto_discovery_schema', 'leiden_to_cell_type', 'linked_anndata']
  linked_adata: (9, 60)
# ChromData discovery schema

dataset: takei2025_doc_subset_pantheon_20
genome: mm10
xyz_unit: um
shape: 56036 spots, 213 traces, 9 cells

modalities:
- cell_metadata: present; operations: cell_type_stratification, embedding_visualization
- chromatin_tracing: present; operations: chromosome_subset, cell_subset, trace_subset, pairwise_3d_distance, intra_chromatin_distance, inter_chromatin_distance
- if_tracks: present; operations: marker_high_low_bin_selection, marker_stratified_distance, per_cell_marker_summary, per_cell_type_marker_summary
- rna_expression: present; operations: gene_expression_lookup, expression_stratification, gene_marker_correlation, chromatin_expression_association

chroms: 20 [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrX]
cell_types: 3 [Bergmann=3, Granule=3, Purkinje=3]
tracks: 62 [CPSF6, ATRX, H4K8ac, HDAC2, H3K9ac, H3K9me3, H3K9me2, RNAPIISer2-P, H3, H3K36me2, UBTF, LaminB1, RNAPIISer5-P, RYBP, HP1beta, RING1B, H2A.X, H3K4me1, H4K20me2, H3K27me2 ...]
linked_adata: shape=[9, 60], X=csr_matrix
genes: 60 [Aldoc, Calb1, Cdh22, Drd3, Eomes, Ephb2, Foxj1, Gabra6, Gpr176, Grm1, Hspb1, Mrc1, Nefh, Npas3, Nptn, Olig1, Pcp2, Pcp4, Plcb3, Plcb4 ...]

known_missing:
- cellm['if_mean'] per-cell IF mean matrix
- raw RNA seqFISH spot geometry as a first-class ChromData component
- scRNA reference matrix for external expression comparison
- gene annotation cache for gene-neighborhood analyses

verification_required:
- required_fields_exist
- minimum_cell_count
- minimum_spot_or_trace_count
- finite_numeric_output
- statistical_hypothesis_test
- runtime_under_budget
- deterministic_rerun
- negative_control_or_permutation
- redundancy_against_existing_parameters

Required data checks¶

In [3]:
review = review_idea_against_schema(IDEA, schema) if schema is not None else None
print(None if review is None else review.to_dict())
assert review is None or review.accepted, review.to_dict()
{'accepted': True, 'errors': [], 'warnings': ['multi-modal idea should include a cell_id_alignment validation check'], 'missing_fields': []}

Exploration¶

The code agent can freely add cells below this point.

Critique and compact analysis plan¶

This idea is testable in the subset because spots.cell_id, tracks.H3K27ac, cell metadata, and linked RNA expression are present. The main limitation is sample size: only nine cells (three per annotated cell type) are available, so the result should be treated as an exploratory cell-level association rather than a definitive biological effect. I will (1) verify alignment between cdata.cells.index and linked_adata.obs_names, (2) summarize per-cell median H3K27ac from spot-level IF tracks, (3) extract linked Pcp2 expression, (4) compute a Spearman cell-level correlation, and (5) compare the observed correlation to a seeded shuffled-cell permutation null.

In [4]:
# Lightweight data inspection: available columns, alignment, finite coverage.
import numpy as np
import pandas as pd

cell_index = pd.Index(map(str, cdata.cells.index), name='cell_id')
adata_obs = pd.Index(map(str, adata.obs_names), name='cell_id')
alignment_exact = list(cell_index) == list(adata_obs)

# Inspect H3K27ac vector and spot cell IDs without materializing large pairwise structures.
h3 = np.asarray(cdata.tracks['H3K27ac']).reshape(-1)
spot_cell_ids = pd.Series(np.asarray(cdata.spots['cell_id']).astype(str), name='cell_id')
finite_h3 = np.isfinite(h3)

pcp2_gene_idx = list(adata.var_names).index('Pcp2')
pcp2_expr = np.asarray(adata.X[:, pcp2_gene_idx].toarray()).reshape(-1) if hasattr(adata.X, 'toarray') else np.asarray(adata.X[:, pcp2_gene_idx]).reshape(-1)

inspection_summary = {
    'n_spots': int(len(cdata.spots)),
    'n_cells': int(len(cdata.cells)),
    'linked_adata_shape': tuple(map(int, adata.shape)),
    'cell_id_alignment_exact': bool(alignment_exact),
    'h3k27ac_finite_fraction': float(finite_h3.mean()),
    'pcp2_finite_count': int(np.isfinite(pcp2_expr).sum()),
    'cell_type_counts': cdata.cells['cell_type'].astype(str).value_counts().sort_index().to_dict(),
}
print(inspection_summary)
preview = pd.DataFrame({
    'cell_id': cell_index,
    'cell_type': cdata.cells['cell_type'].astype(str).to_numpy(),
    'Pcp2_expression': pcp2_expr,
}).head(9)
display(preview)
{'n_spots': 56036, 'n_cells': 9, 'linked_adata_shape': (9, 60), 'cell_id_alignment_exact': True, 'h3k27ac_finite_fraction': 1.0, 'pcp2_finite_count': 9, 'cell_type_counts': {'Bergmann': 3, 'Granule': 3, 'Purkinje': 3}}
   cell_id cell_type  Pcp2_expression
0   1_0_42   Granule              0.0
1   1_0_47   Granule              0.0
2   1_0_69   Granule              1.0
3   1_0_34  Bergmann              5.0
4   1_0_61  Bergmann             11.0
5   1_0_63  Bergmann             26.0
6   1_0_26  Purkinje             94.0
7   1_0_37  Purkinje            126.0
8  1_0_116  Purkinje            376.0
In [5]:
# Main exploration: per-cell Pcp2 expression vs per-cell median H3K27ac with shuffled-cell permutation test.
import os
os.environ.setdefault('MPLBACKEND', 'Agg')
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('Agg', force=True)
import matplotlib.pyplot as plt
from scipy.stats import spearmanr
from IPython.display import display, Image

rng = np.random.default_rng(20250314)
figure_path = RUN_OUTPUT_DIR / 'pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5_statistical_summary.png'
result_path = RUN_OUTPUT_DIR / 'pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5_result.csv'

cell_ids = pd.Index(map(str, cdata.cells.index), name='cell_id')
cell_types = pd.Series(cdata.cells['cell_type'].astype(str).to_numpy(), index=cell_ids, name='cell_type')
adata_ids = pd.Index(map(str, adata.obs_names), name='cell_id')
if list(cell_ids) != list(adata_ids):
    raise ValueError('linked_adata.obs_names do not exactly match cdata.cells.index; refusing implicit realignment')

# Per-cell median H3K27ac from all spots assigned to each cell.
h3 = np.asarray(cdata.tracks['H3K27ac']).reshape(-1).astype(float)
spot_cell_ids = pd.Series(np.asarray(cdata.spots['cell_id']).astype(str), name='cell_id')
spot_df = pd.DataFrame({'cell_id': spot_cell_ids, 'H3K27ac': h3})
per_cell_h3 = spot_df[np.isfinite(spot_df['H3K27ac'])].groupby('cell_id', sort=False)['H3K27ac'].agg(
    h3k27ac_median='median', h3k27ac_mean='mean', n_spots='size'
)

# Linked Pcp2 expression from adata.
pcp2_idx = list(adata.var_names).index('Pcp2')
pcp2 = np.asarray(adata.X[:, pcp2_idx].toarray()).reshape(-1) if hasattr(adata.X, 'toarray') else np.asarray(adata.X[:, pcp2_idx]).reshape(-1)
pcp2 = pd.Series(pcp2.astype(float), index=adata_ids, name='Pcp2_expression')

cell_table = pd.DataFrame(index=cell_ids)
cell_table['cell_id'] = cell_ids
cell_table['cell_type'] = cell_types.reindex(cell_ids).to_numpy()
cell_table['Pcp2_expression'] = pcp2.reindex(cell_ids).to_numpy()
cell_table = cell_table.join(per_cell_h3, how='left')
cell_table['finite_for_test'] = np.isfinite(cell_table['Pcp2_expression']) & np.isfinite(cell_table['h3k27ac_median'])

test_df = cell_table.loc[cell_table['finite_for_test']].copy()
n_selected = int(len(test_df))
observed_rho = float(spearmanr(test_df['Pcp2_expression'], test_df['h3k27ac_median']).statistic) if n_selected >= 3 else np.nan

n_permutations = 1000
null_rhos = np.empty(n_permutations, dtype=float)
if n_selected >= 3 and np.isfinite(observed_rho):
    x = test_df['Pcp2_expression'].to_numpy(dtype=float)
    y = test_df['h3k27ac_median'].to_numpy(dtype=float)
    for i in range(n_permutations):
        null_rhos[i] = float(spearmanr(rng.permutation(x), y).statistic)
    finite_null = np.isfinite(null_rhos)
    # One-sided test for positive association greater than shuffled-cell controls.
    p_value = float((1 + np.sum(null_rhos[finite_null] >= observed_rho)) / (1 + np.sum(finite_null)))
    null_mean = float(np.nanmean(null_rhos))
    effect_size = float(observed_rho - null_mean)
    hypothesis_test_status = 'pass'
else:
    null_rhos[:] = np.nan
    p_value = 1.0
    null_mean = np.nan
    effect_size = 0.0
    hypothesis_test_status = 'insufficient_data'

test_method = f'seeded one-sided shuffled-cell permutation test of Spearman rho (n_permutations={n_permutations})'
null_hypothesis = 'Pcp2 expression is exchangeable across cell IDs; the cell-level Spearman rho with median H3K27ac is no larger than shuffled-cell controls.'
alternative_hypothesis = 'Correctly aligned cells have a larger positive Spearman rho between Pcp2 expression and median H3K27ac than shuffled-cell controls.'

# Repeat test-level fields on each row so result_table is directly auditable.
result_table = cell_table.reset_index(drop=True).copy()
result_table['observed_statistic'] = observed_rho
result_table['effect_size'] = effect_size
result_table['p_value'] = p_value
result_table['test_method'] = test_method
result_table['n_selected_cells'] = n_selected
result_table['n_permutations'] = n_permutations
result_table['null_mean_rho'] = null_mean
result_table.to_csv(result_path, index=False)

analysis_summary = {
    'idea_id': IDEA.idea_id,
    'parameter_name': 'pcp2_h3k27ac_spearman_rho',
    'parameter_value': observed_rho,
    'observed_statistic': observed_rho,
    'effect_size': effect_size,
    'p_value': p_value,
    'test_method': test_method,
    'null_hypothesis': null_hypothesis,
    'alternative_hypothesis': alternative_hypothesis,
    'hypothesis_test_status': hypothesis_test_status,
    'n_selected_cells': n_selected,
    'n_rows': int(result_table['n_spots'].fillna(0).sum()),
    'n_result_rows': int(len(result_table)),
    'n_permutations': n_permutations,
    'null_mean_rho': null_mean,
    'result_path': str(result_path),
    'figure_path': str(figure_path),
    'cell_id_alignment_exact': bool(list(cell_ids) == list(adata_ids)),
    'notes': [
        'Only nine cells are available, so the permutation p-value is exploratory.',
        'Negative control is generated by shuffling Pcp2 expression across cell IDs.'
    ],
}

# Statistical figure: cell-level scatter plus observed statistic against permutation null.
plt.close('all')
fig, axes = plt.subplots(1, 2, figsize=(10.8, 4.2), constrained_layout=True)
fig.patch.set_facecolor('white')
for ax in axes:
    ax.set_facecolor('white')

palette = {'Granule': '#4C78A8', 'Bergmann': '#F58518', 'Purkinje': '#54A24B'}
for ct, sub in test_df.groupby('cell_type', sort=True):
    axes[0].scatter(
        sub['Pcp2_expression'], sub['h3k27ac_median'],
        s=60, alpha=0.9, label=f'{ct} (n={len(sub)})',
        color=palette.get(ct, '#777777'), edgecolor='black', linewidth=0.4,
    )
axes[0].set_xlabel('Linked Pcp2 expression (counts)')
axes[0].set_ylabel('Per-cell median H3K27ac track intensity (a.u.)')
axes[0].set_title('Aligned per-cell RNA–IF association')
axes[0].legend(frameon=False, fontsize=8)
axes[0].grid(True, alpha=0.25)

finite_null_rhos = null_rhos[np.isfinite(null_rhos)]
axes[1].hist(finite_null_rhos, bins=24, color='#BDBDBD', edgecolor='white', label='Shuffled-cell null')
axes[1].axvline(observed_rho, color='#D62728', linewidth=2.5, label=f'Observed rho={observed_rho:.3f}')
axes[1].axvline(null_mean, color='#333333', linestyle='--', linewidth=1.5, label=f'Null mean={null_mean:.3f}')
axes[1].set_xlabel('Spearman rho after Pcp2 cell-label shuffle')
axes[1].set_ylabel('Permutation count')
axes[1].set_title('Hypothesis-test evidence')
axes[1].legend(frameon=False, fontsize=8)
axes[1].grid(True, alpha=0.25)
axes[1].text(
    0.02, 0.98,
    f'one-sided p={p_value:.4f}\neffect=rho-null mean={effect_size:.3f}\nn={n_selected}, permutations={n_permutations}',
    transform=axes[1].transAxes, va='top', ha='left', fontsize=9,
    bbox=dict(facecolor='white', edgecolor='#CCCCCC', alpha=0.95),
)
fig.suptitle('Pcp2 expression vs per-cell H3K27ac exceeds shuffled-cell control?', fontsize=12)
fig.savefig(figure_path, dpi=180, bbox_inches='tight', facecolor='white')
display(Image(filename=str(figure_path)))

print(json.dumps(analysis_summary, indent=2))
display(result_table)
<IPython.core.display.Image object>
{
  "idea_id": "pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5",
  "parameter_name": "pcp2_h3k27ac_spearman_rho",
  "parameter_value": 0.870300505105952,
  "observed_statistic": 0.870300505105952,
  "effect_size": 0.8584342924497959,
  "p_value": 0.004995004995004995,
  "test_method": "seeded one-sided shuffled-cell permutation test of Spearman rho (n_permutations=1000)",
  "null_hypothesis": "Pcp2 expression is exchangeable across cell IDs; the cell-level Spearman rho with median H3K27ac is no larger than shuffled-cell controls.",
  "alternative_hypothesis": "Correctly aligned cells have a larger positive Spearman rho between Pcp2 expression and median H3K27ac than shuffled-cell controls.",
  "hypothesis_test_status": "pass",
  "n_selected_cells": 9,
  "n_rows": 56036,
  "n_result_rows": 9,
  "n_permutations": 1000,
  "null_mean_rho": 0.011866212656156153,
  "result_path": "tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5_result.csv",
  "figure_path": "tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5_statistical_summary.png",
  "cell_id_alignment_exact": true,
  "notes": [
    "Only nine cells are available, so the permutation p-value is exploratory.",
    "Negative control is generated by shuffling Pcp2 expression across cell IDs."
  ]
}
   cell_id cell_type  ...  n_permutations  null_mean_rho
0   1_0_42   Granule  ...            1000       0.011866
1   1_0_47   Granule  ...            1000       0.011866
2   1_0_69   Granule  ...            1000       0.011866
3   1_0_34  Bergmann  ...            1000       0.011866
4   1_0_61  Bergmann  ...            1000       0.011866
5   1_0_63  Bergmann  ...            1000       0.011866
6   1_0_26  Purkinje  ...            1000       0.011866
7   1_0_37  Purkinje  ...            1000       0.011866
8  1_0_116  Purkinje  ...            1000       0.011866

[9 rows x 14 columns]
No description has been provided for this image

Statistical figure¶

Statistical figure for Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls

Agent-generated quantitative figure saved during exploration.

Runner verification summary¶

This scaffolded section is generated by U-Chrom. The notebook agent executes it after exploration, and the runner re-executes it during final verification.

In [6]:
checks = {check: 'not_run' for check in IDEA.validation_checks}
notes = []
checks.setdefault('statistical_hypothesis_test', 'not_run')

def _check_keys(prefix):
    return [key for key in checks if key == prefix or key.startswith(prefix + ':')]

def _set_check(prefix, value):
    keys = _check_keys(prefix)
    if not keys:
        checks[prefix] = value
        return
    for key in keys:
        checks[key] = value

def _check_status(prefix):
    values = [checks[key] for key in _check_keys(prefix)]
    if not values:
        return None
    if 'fail' in values:
        return 'fail'
    if all(value == 'pass' for value in values):
        return 'pass'
    return values[0]

_set_check('required_fields_exist', 'pass' if review is not None and review.accepted else 'fail')
if _check_keys('cell_id_alignment'):
    aligned = True
    if cdata is not None and adata is not None and len(cdata.cells) == len(adata.obs_names):
        aligned = list(map(str, cdata.cells.index)) == list(map(str, adata.obs_names))
    _set_check('cell_id_alignment', 'pass' if aligned else 'fail')
if _check_keys('minimum_cell_count'):
    n_cells = analysis_summary.get('n_selected_cells')
    if n_cells is None and 'cell_type' in getattr(result_table, 'columns', []):
        n_cells = len(result_table)
    if n_cells is None:
        n_cells = len(cdata.cells) if cdata is not None and getattr(cdata, 'n_cells', 0) else 0
    _set_check('minimum_cell_count', 'pass' if n_cells >= 1 else 'fail')
if _check_keys('minimum_spot_or_trace_count'):
    n_rows = analysis_summary.get('n_rows')
    if n_rows is None:
        n_rows = len(result_table) if result_table is not None else 0
    _set_check('minimum_spot_or_trace_count', 'pass' if n_rows >= 1 else 'fail')
if _check_keys('finite_numeric_output'):
    value = analysis_summary.get('parameter_value')
    _set_check('finite_numeric_output', 'pass' if value is not None and np.isfinite(value) else 'fail')
if _check_keys('statistical_hypothesis_test'):
    p_value = analysis_summary.get('p_value')
    test_method = analysis_summary.get('test_method')
    null_hypothesis = analysis_summary.get('null_hypothesis')
    alternative_hypothesis = analysis_summary.get('alternative_hypothesis')
    observed_statistic = analysis_summary.get('observed_statistic')
    effect_size = analysis_summary.get('effect_size')
    hypothesis_test_status = analysis_summary.get('hypothesis_test_status', 'pass')
    try:
        p_float = float(p_value)
    except Exception:
        p_float = np.nan
    try:
        stat_float = float(observed_statistic)
    except Exception:
        stat_float = np.nan
    try:
        effect_float = float(effect_size)
    except Exception:
        effect_float = np.nan
    has_required_test = (
        test_method is not None
        and str(test_method).strip() != ''
        and null_hypothesis is not None
        and str(null_hypothesis).strip() != ''
        and alternative_hypothesis is not None
        and str(alternative_hypothesis).strip() != ''
        and np.isfinite(p_float)
        and 0.0 <= p_float <= 1.0
        and np.isfinite(stat_float)
        and np.isfinite(effect_float)
        and hypothesis_test_status != 'insufficient_data'
    )
    if result_table is not None and hasattr(result_table, 'columns'):
        has_required_test = has_required_test and 'p_value' in result_table.columns and 'test_method' in result_table.columns
    else:
        has_required_test = False
    _set_check('statistical_hypothesis_test', 'pass' if has_required_test else 'fail')
    if not has_required_test:
        notes.append('statistical_hypothesis_test failed: analysis_summary must include null_hypothesis, alternative_hypothesis, test_method, observed_statistic, effect_size, finite p_value in [0,1], and result_table columns p_value/test_method')
if _check_keys('negative_control_or_permutation'):
    test_method_text = str(analysis_summary.get('test_method', '')).lower()
    summary_keys_text = ' '.join(str(key).lower() for key in analysis_summary.keys())
    result_columns_text = ''
    if result_table is not None and hasattr(result_table, 'columns'):
        result_columns_text = ' '.join(str(col).lower() for col in result_table.columns)
    control_text = ' '.join([test_method_text, summary_keys_text, result_columns_text])
    has_control_or_permutation = any(
        token in control_text
        for token in ['permutation', 'randomization', 'shuffle', 'negative_control', 'null_distribution', 'control']
    )
    _set_check(
        'negative_control_or_permutation',
        'pass' if has_control_or_permutation else 'not_implemented',
    )
for check in list(checks):
    if checks[check] == 'not_run' and ('negative_control' in check or check.endswith('_control')):
        checks[check] = 'not_implemented'

required_for_pass = ['required_fields_exist', 'minimum_cell_count', 'finite_numeric_output', 'statistical_hypothesis_test']
status = 'pass'
for check in required_for_pass:
    if _check_status(check) == 'fail':
        status = 'fail'
        notes.append(f'{check} failed')
n_rows_for_status = analysis_summary.get('n_rows')
if n_rows_for_status is None:
    n_rows_for_status = len(result_table) if result_table is not None else 0
if n_rows_for_status == 0:
    status = 'fail'
    notes.append('analysis produced no result rows')

verification = {
    'idea_id': IDEA.idea_id,
    'status': status,
    'checks': checks,
    'parameter_value': analysis_summary.get('parameter_value'),
    'p_value': analysis_summary.get('p_value'),
    'test_method': analysis_summary.get('test_method'),
    'effect_size': analysis_summary.get('effect_size'),
    'result_path': analysis_summary.get('result_path'),
    'notes': notes + analysis_summary.get('notes', []),
}
print(json.dumps(verification, indent=2))
{
  "idea_id": "pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5",
  "status": "pass",
  "checks": {
    "required_fields_exist": "pass",
    "minimum_cell_count": "pass",
    "minimum_spot_or_trace_count": "pass",
    "finite_numeric_output": "pass",
    "statistical_hypothesis_test_with_p_value": "not_run",
    "runtime_under_budget": "not_run",
    "deterministic_rerun": "not_run",
    "negative_control_or_permutation": "pass",
    "statistical_hypothesis_test": "pass"
  },
  "parameter_value": 0.870300505105952,
  "p_value": 0.004995004995004995,
  "test_method": "seeded one-sided shuffled-cell permutation test of Spearman rho (n_permutations=1000)",
  "effect_size": 0.8584342924497959,
  "result_path": "tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5_result.csv",
  "notes": [
    "Only nine cells are available, so the permutation p-value is exploratory.",
    "Negative control is generated by shuffling Pcp2 expression across cell IDs."
  ]
}

Final interpretation¶

Hypothesis. Cells with higher Pcp2 expression have higher per-cell H3K27ac signal, and this cell-level RNA-chromatin association is lost when cell identities are permuted.

Exploration. The notebook operationalized the idea as pcp2_h3k27ac_spearman_rho: Spearman correlation across cells between linked_adata Pcp2 expression and per-cell median tracks.H3K27ac. using modalities if_tracks, cell_metadata, rna_expression in cell type(s) Granule, Bergmann, Purkinje. Required data fields checked: spots.cell_id, tracks.H3K27ac, cells.cell_type, linked_adata.X, linked_adata.var.Pcp2.

Statistical evidence. U-Chrom runner status: Notebook verified. Test: seeded one-sided shuffled-cell permutation test of Spearman rho (n_permutations=1000). Observed statistic: 0.8703; effect size: 0.8584; parameter value: 0.8703; p-value: 0.004995.

Conclusion. Supported (Expected direction). The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.

What verification means. Notebook verified means the run passed schema/data checks, produced finite numeric output, and included an explicit p-value/effect-size hypothesis test. It does not mean the biological hypothesis is automatically correct.

Checks passed. deterministic_rerun, finite_numeric_output, minimum_cell_count, minimum_spot_or_trace_count, negative_control_or_permutation, required_fields_exist, runtime_under_budget, statistical_hypothesis_test.

Main caveat. Only nine cells are available, so the permutation p-value is exploratory.

Final interpretation¶

The audit found exact cell-ID alignment between cdata.cells.index and linked_adata.obs_names, finite H3K27ac coverage for all spots, and finite Pcp2 expression for all 9 cells. Per-cell median H3K27ac increased with linked Pcp2 expression overall, although the analysis is limited by the very small number of cells in this subset.

Hypothesis test. The main statistic was the across-cell Spearman correlation between linked Pcp2 expression and per-cell median H3K27ac. A seeded one-sided shuffled-cell permutation test (1,000 permutations) gave observed rho = 0.8703, p = 0.004995, and effect size = 0.8584 relative to the permutation null mean. This supports the expected positive aligned RNA–IF association in this small exploratory dataset.

Visual QA. The saved statistical figure is non-blank and readable: it shows the aligned per-cell scatter by cell type and the shuffled-cell null distribution with the observed rho marked in red, annotated with p-value, effect size, n, and permutation count. No decorative or misleading elements were observed; no fixes were needed.