Takei 2025 auto-discovery example¶

This notebook documents a real PantheonOS-agent-backed auto-discovery pass on the linked Takei 2025 cerebellum .h5cd / .h5ad data. It exposes the dataset schema to Pantheon idea agents, then inspects a completed 20-idea notebook-agent batch on a small Takei-derived execution subset. The batch includes live notebook construction, agent-authored exploratory analysis, explicit hypothesis testing with p-values/effect sizes, saved statistical matplotlib figures, cell execution, and U-Chrom re-execution/verification.

The rendered outputs below use OpenAI as the model provider for the PantheonOS agents in this environment. The U-Chrom runner is provider-agnostic at the Pantheon layer; change MODEL to use another Pantheon-supported model backend.

from pathlib import Path
from copy import deepcopy
import json
import os
os.environ.setdefault('MPLBACKEND', 'Agg')
import shutil
from collections import Counter

import numpy as np
import pandas as pd


try:
    from loguru import logger as _pantheon_logger
    _pantheon_logger.remove()
    _pantheon_logger.disable('pantheon')
except Exception:
    pass

from uchrom.io import load_takei2025_cerebellum
from uchrom.auto_discovery import (
    DiscoveryRunConfig,
    generate_pantheon_ideas,
    review_idea_against_schema,
    run_auto_discovery,
    schema_to_agent_context,
)

MODEL = os.environ.get('UCHROM_PANTHEON_MODEL', 'openai/gpt-5.5')
LLM_TIMEOUT = 900
IDEA_AGENT_COUNT = 2
NOTEBOOK_AGENT_CONCURRENCY = 2
GENERATE_SCHEMATIC_IMAGE = False
SCHEMATIC_IMAGE_MODEL = os.environ.get('UCHROM_SCHEMATIC_IMAGE_MODEL', 'openai')
SCHEMATIC_IMAGE_MODEL_ARGS = {'size': '1536x1024', 'quality': 'high', 'output_format': 'png'}


def repo_root() -> Path:
    for p in [Path.cwd(), *Path.cwd().parents]:
        if (p / 'pyproject.toml').exists() and (p / 'uchrom').exists():
            return p
    return Path.cwd()


def dotenv_has_openai_key(path: Path) -> bool:
    if not path.exists():
        return False
    for line in path.read_text().splitlines():
        line = line.strip()
        if line.startswith('OPENAI_API_KEY=') and line.split('=', 1)[1].strip().strip('"').strip("'"):
            return True
    return False

ROOT = repo_root()
TAKEI_DIR = ROOT / 'example-data' / 'takei2025_cerebellum'
OUT = ROOT / 'tmp' / 'takei_auto_discovery_doc'
OUT.mkdir(parents=True, exist_ok=True)
print(f'root: {ROOT}')
print(f'output: {OUT}')
print(f'PantheonOS model: {MODEL}')
print(f'idea agents: {IDEA_AGENT_COUNT}, notebook agent concurrency: {NOTEBOOK_AGENT_CONCURRENCY}')
print(f'schematic image generation in main exploration run: {GENERATE_SCHEMATIC_IMAGE} (post-hoc only), model={SCHEMATIC_IMAGE_MODEL}')
print(f'OpenAI key available for this rendered example: {bool(os.environ.get("OPENAI_API_KEY")) or dotenv_has_openai_key(Path.home() / ".env")}')

root: /Users/weizexu/Projects/U-Chrom
output: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc
PantheonOS model: openai/gpt-5.5
idea agents: 2, notebook agent concurrency: 2
schematic image generation in main exploration run: False (post-hoc only), model=openai
OpenAI key available for this rendered example: True

1. Load linked Takei data¶

load_takei2025_cerebellum() returns a ChromData object. The RNA expression matrix is available as cdata.linked_adata and is aligned on cell IDs.

cdata = load_takei2025_cerebellum(
    replicate=1,
    data_dir=TAKEI_DIR,
    download=True,
)
adata = cdata.linked_adata
print(cdata)
print(f'linked_adata shape: {None if adata is None else adata.shape}')

ChromData: n_spots=10912638, n_traces=59112, n_cells=1799
  spots:   ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
  cells:   ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (1799 cells)
  cellm:   {'umap': (1799, 2)}
  tracks:  ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
  traces:  ['dbscan_allele', 'dbscan_ldp_allele'] (59112 traces)
  uns:     ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'leiden_to_cell_type', 'linked_anndata']
  linked_adata: (1799, 60)
linked_adata shape: (1799, 60)

2. Build the h5cd-backed discovery schema¶

The schema is agent-readable: it summarizes axes, modalities, available fields, tracks, cell types, genes, and known missing data. Here we build it in memory for display; production runs can store it in cdata.uns['auto_discovery_schema'] and write it back to .h5cd.

schema = cdata.build_discovery_schema(
    store=False,
    dataset_name='takei2025_cerebellum_rep1',
    max_catalog_items=80,
)
print(schema_to_agent_context(schema, max_items=16))

# ChromData discovery schema

dataset: takei2025_cerebellum_rep1
genome: mm10
xyz_unit: um
shape: 10912638 spots, 59112 traces, 1799 cells

modalities:
- chromatin_tracing: present; operations: chromosome_subset, cell_subset, trace_subset, pairwise_3d_distance, intra_chromatin_distance, inter_chromatin_distance
- if_tracks: present; operations: marker_high_low_bin_selection, marker_stratified_distance, per_cell_marker_summary, per_cell_type_marker_summary
- cell_metadata: present; operations: cell_type_stratification, embedding_visualization
- rna_expression: present; operations: gene_expression_lookup, expression_stratification, gene_marker_correlation, chromatin_expression_association

chroms: 20 [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr3, chr4, chr5, chr6 ...]
cell_types: 6 [Granule=1109, Other=323, Bergmann=192, MLI1=90, Purkinje=58, MLI2+PLI=27]
tracks: 62 [CPSF6, ATRX, H4K8ac, HDAC2, H3K9ac, H3K9me3, H3K9me2, RNAPIISer2-P, H3, H3K36me2, UBTF, LaminB1, RNAPIISer5-P, RYBP, HP1beta, RING1B ...]
linked_adata: shape=[1799, 60], X=csr_matrix
genes: 60 [Aldoc, Calb1, Cdh22, Drd3, Eomes, Ephb2, Foxj1, Gabra6, Gpr176, Grm1, Hspb1, Mrc1, Nefh, Npas3, Nptn, Olig1 ...]

known_missing:
- cellm['if_mean'] per-cell IF mean matrix
- raw RNA seqFISH spot geometry as a first-class ChromData component
- scRNA reference matrix for external expression comparison
- gene annotation cache for gene-neighborhood analyses

verification_required:
- required_fields_exist
- minimum_cell_count
- minimum_spot_or_trace_count
- finite_numeric_output
- statistical_hypothesis_test
- runtime_under_budget
- deterministic_rerun
- negative_control_or_permutation
- redundancy_against_existing_parameters

3. Generate and review Pantheon idea-agent proposals on the full dataset schema¶

This cell writes the full Takei discovery schema to disk, launches multiple PantheonOS idea agents in parallel, and reviews their returned ideas against the h5cd-backed schema. These idea agents are restricted to file access; they read schema.json and schema_context.md instead of touching notebooks.

full_schema_idea_dir = OUT / 'pantheon_full_schema_ideas'
if full_schema_idea_dir.exists():
    shutil.rmtree(full_schema_idea_dir)

ideas, idea_agent_records = await generate_pantheon_ideas(
    schema,
    output_dir=full_schema_idea_dir,
    max_ideas=2,
    model=MODEL,
    timeout=LLM_TIMEOUT,
    idea_agent_count=IDEA_AGENT_COUNT,
)
rows = []
for idea in ideas:
    review = review_idea_against_schema(idea, schema, max_complexity=5)
    rows.append({
        'idea_id': idea.idea_id,
        'title': idea.idea_title,
        'cell_types': ', '.join(idea.cell_types) or 'all',
        'modalities': ' + '.join(idea.modalities),
        'accepted': review.accepted,
        'warnings': '; '.join(review.warnings),
    })
ideas_df = pd.DataFrame(rows)
print(f'PantheonOS model: {MODEL}')
print(f'idea agent records: {len(idea_agent_records)}')
print(ideas_df[['title', 'cell_types', 'modalities', 'accepted']].to_string(index=False, max_colwidth=64))
print('\nmodality combinations:')
print(Counter(ideas_df['modalities']).most_common())
print(f"RNA-linked ideas: {ideas_df['modalities'].str.contains('rna_expression').sum()}")

PantheonOS model: openai/gpt-5.5
idea agent records: 2
                                                           title cell_types                                    modalities  accepted
   Granule-cell H3K27ac radial centrality from chromatin tracing    Granule chromatin_tracing + if_tracks + cell_metadata      True
Purkinje Pcp2 expression predicts chromatin-associated elonga...   Purkinje    if_tracks + cell_metadata + rna_expression     False

modality combinations:
[('chromatin_tracing + if_tracks + cell_metadata', 1), ('if_tracks + cell_metadata + rna_expression', 1)]
RNA-linked ideas: 1

4. Create a small Takei-derived h5cd for execution¶

The runner executes one notebook per accepted idea. For documentation we use a small subset sampled from the real Takei data: three cell types, three cells each, all their spots/traces, and the matching rows of linked_adata.

cell_types = ['Granule', 'Bergmann', 'Purkinje']
cells_per_type = 3
selected_cells = []
for ct in cell_types:
    ids = list(cdata.cells.index[cdata.cells['cell_type'].astype(str) == ct][:cells_per_type])
    selected_cells.extend(map(str, ids))

spot_mask = cdata.spots['cell_id'].astype(str).isin(selected_cells).to_numpy()
takei_small = cdata[spot_mask]
takei_small.uns = deepcopy(takei_small.uns)

cell_order = [str(x) for x in takei_small.cells.index]
adata_small = adata[cell_order].copy()
small_h5ad = OUT / 'takei_doc_auto_subset.h5ad'
small_h5cd = OUT / 'takei_doc_auto_subset.h5cd'
adata_small.write_h5ad(small_h5ad)
takei_small.linked_adata = adata_small
takei_small.uns['linked_anndata'] = {
    'path': str(small_h5ad),
    'n_obs': int(adata_small.n_obs),
    'n_vars': int(adata_small.n_vars),
    'cell_id_axis': 'obs_names',
}
takei_small.build_discovery_schema(store=True, dataset_name='takei2025_doc_subset')
takei_small.write(small_h5cd)

print(takei_small)
print(f'subset h5cd: {small_h5cd}')
print(f'subset linked_adata: {adata_small.shape}')
print(takei_small.cells['cell_type'].value_counts().to_string())

ChromData: n_spots=56036, n_traces=213, n_cells=9
  spots:   ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
  cells:   ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (9 cells)
  cellm:   {'umap': (9, 2)}
  tracks:  ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
  traces:  ['dbscan_allele', 'dbscan_ldp_allele'] (213 traces)
  uns:     ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'leiden_to_cell_type', 'linked_anndata', 'auto_discovery_schema']
  linked_adata: (9, 60)
subset h5cd: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd
subset linked_adata: (9, 60)
cell_type
Granule     3
Bergmann    3
Purkinje    3

5. Run notebook-first auto-discovery¶

The rendered documentation uses a completed 20-idea Pantheon notebook-agent batch under tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg. The expensive agent run is not repeated during normal docs builds; this cell reconstructs the run summary from ideas.jsonl, reviews.jsonl, results.jsonl, and the exported notebooks.

To reproduce the batch from the saved idea set, run the CLI with a non-interactive matplotlib backend:

MPLBACKEND=Agg python -m uchrom.auto_discovery run   tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd   tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg   --ideas-path tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_retry/ideas.jsonl   --max-ideas 20   --max-complexity 5   --code-source pantheon   --model openai/gpt-5.5   --llm-timeout 900   --notebook-agent-concurrency 4   --dataset-name takei2025_doc_subset_pantheon_20   --store-schema

Notebook agents receive file access plus live notebook tools. They read the scaffold, edit cells, insert Markdown notes, execute code cells, inspect outputs, and leave auditable notebooks behind. U-Chrom then re-executes each notebook from top to bottom and verifies that it produced a finite result, explicit hypothesis-test metadata, and a saved statistical figure.

from types import SimpleNamespace

run_dir = OUT / 'run_pantheon_20_ideas_verified_agg'
if not run_dir.exists():
    raise FileNotFoundError(
        f'Missing completed run directory: {run_dir}. Reproduce it with the CLI command shown above.'
    )


def count_jsonl(path):
    with open(path) as fh:
        return sum(1 for line in fh if line.strip())

results_for_summary = []
with open(run_dir / 'results.jsonl') as fh:
    for line in fh:
        if line.strip():
            results_for_summary.append(json.loads(line))

notebooks = sorted(str(p) for p in (run_dir / 'notebooks').glob('*.ipynb'))
run_result = SimpleNamespace(
    output_dir=str(run_dir),
    n_generated=count_jsonl(run_dir / 'ideas.jsonl'),
    n_accepted=sum(1 for line in open(run_dir / 'reviews.jsonl') if line.strip() and json.loads(line).get('accepted')),
    n_executed=len(results_for_summary),
    n_verified=sum((item.get('verification') or {}).get('status') == 'pass' for item in results_for_summary),
    report_path=str(run_dir / 'report.md'),
    ideas_path=str(run_dir / 'ideas.jsonl'),
    reviews_path=str(run_dir / 'reviews.jsonl'),
    results_path=str(run_dir / 'results.jsonl'),
    notebooks=notebooks,
    agent_records_path=str(run_dir / 'agent_records.jsonl'),
)
print(json.dumps({
    'output_dir': run_result.output_dir,
    'n_generated': run_result.n_generated,
    'n_accepted': run_result.n_accepted,
    'n_executed': run_result.n_executed,
    'n_verified': run_result.n_verified,
    'notebook_count': len(run_result.notebooks),
    'report_path': run_result.report_path,
}, indent=2))

{
  "output_dir": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg",
  "n_generated": 20,
  "n_accepted": 20,
  "n_executed": 20,
  "n_verified": 20,
  "notebook_count": 20,
  "report_path": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/report.md"
}

6. Inspect verified ideas¶

The runner writes machine-readable artifacts. This table is built from the completed batch results.jsonl; status='pass' means the notebook executed and the verification cell found required fields, enough data, a finite parameter value, hypothesis-test metadata, and a statistical figure.

ideas_by_id = {}
with open(run_result.ideas_path) as fh:
    for line in fh:
        item = json.loads(line)
        ideas_by_id[item['idea_id']] = item

result_rows = []
with open(run_result.results_path) as fh:
    for line in fh:
        item = json.loads(line)
        verification = item.get('verification') or {}
        idea = ideas_by_id.get(item['idea_id'], {})
        result_rows.append({
            'title': idea.get('idea_title', item['idea_id']),
            'status': verification.get('status'),
            'p_value': verification.get('p_value'),
            'effect_size': verification.get('effect_size'),
            'test_method': verification.get('test_method'),
            'notebook': Path(item['notebook']).name,
        })
results_df = pd.DataFrame(result_rows)
print(results_df.to_string(index=False, max_colwidth=58))
print('
status counts:')
print(results_df['status'].value_counts().to_string())

                                                     title status  p_value  effect_size                                                test_method                                                   notebook
Purkinje-specific H3K27ac decompaction along traced chr...   pass 1.000000    -0.076350 one-sided label permutation test on trace/chromosome de... purkinje-specific-h3k27ac-decompaction-along-tra-8f00bd...
  Bergmann-specific LaminB1 peripheral anchoring signature   pass 0.988095     0.498362                one-sided exact cell-label permutation test bergmann-specific-laminb1-peripheral-anchoring-s-509bbf...
Granule-cell HP1alpha heterochromatin clustering in 3D ...   pass 0.001996     0.094115 one-sided matched randomization test (500 permutations;... granule-cell-hp1alpha-heterochromatin-clustering-3401a4...
Pcp2-linked active chromatin hub proximity across cell ...   pass 0.999001     0.878669  one-sided Spearman permutation test (1000 label shuffles) pcp2-linked-active-chromatin-hub-proximity-acros-f62e7e...
Pcp2 expression predicts H3K27ac-marked chromatin spati...   pass 0.998863     0.870301 Spearman correlation, one-sided negative; 1000 determin... pcp2-expression-predicts-h3k27ac-marked-chromati-667c6e...
Aldoc expression tracks lamina-associated chromatin signal   pass 0.070929     0.535570 Spearman correlation with fixed-seed one-sided permutat... aldoc-expression-tracks-lamina-associated-chroma-4c06d0...
Gabra6 expression links to elongating RNA polymerase ch...   pass 0.782968    -0.299244 one-sided Spearman rank correlation with fixed-seed lab... gabra6-expression-links-to-elongating-rna-polyme-eef7dd...
Reln expression predicts spatial coupling of H3K4me1 an...   pass 0.415584    -0.101274 Spearman correlation with 1000 fixed-seed Reln-label pe... reln-expression-predicts-spatial-coupling-of-h3k-ec2890...
 Lamina-proximal local compaction across chromosome traces   pass 0.001996     0.023680 Spearman rank correlation; one-sided permutation test w... lamina-proximal-local-compaction-across-chromoso-ee4567...
             Radial enrichment of active H3K27ac chromatin   pass 0.001996    -0.524224 Within-cell H3K27ac-rank permutation test (500 permutat... radial-enrichment-of-active-h3k27ac-chromatin-65ea38bd9...
Purkinje marker expression predicts chromosome-wide rad...   pass 0.001998    -0.861932 Spearman rank correlation with 1000 reproducible cell-l... purkinje-marker-expression-predicts-chromosome-w-ed7932...
 Lamina association of repetitive satellite-rich chromatin   pass 1.000000     0.712108 within-cell satellite-score permutation test (500 permu... lamina-association-of-repetitive-satellite-rich--6f2c68...
              Xist-marked chrX inter-chromosomal isolation   pass 0.011976     0.028334 within-cell Xist-label randomization test, one-sided gr... xist-marked-chrx-inter-chromosomal-isolation-3bd44d703c...
        Active chromatin assortativity between chromosomes   pass 0.001996     0.165338 one-sided chromosome-label permutation test (500 permut... active-chromatin-assortativity-between-chromosom-6fcdb5...
              rDNA-marked inter-chromosomal hub compaction   pass 0.003322    -0.206427 one-sided within-cell rDNA-label permutation test (300 ... rdna-marked-inter-chromosomal-hub-compaction-41a5f6a09d...
     Chromosome-specific peripheral positioning by LaminB1   pass 0.005988    -0.267949 within-cell chromosome-label permutation test (500 perm... chromosome-specific-peripheral-positioning-by-la-8abdde...
H3K27ac-high loci resist spurious compaction calls afte...   pass 0.874251    -0.014128 within-trace H3K27ac marker permutation test (500 permu... h3k27ac-high-loci-resist-spurious-compaction-cal-b54d7c...
RNAPIISer2-P neighborhoods around polyA_RNA spots shoul...   pass 0.001996     0.391011   one-sided cell-label permutation test (500 permutations) rnapiiser2-p-neighborhoods-around-polya-rna-spot-6bd867...
H3K9me3 radial enrichment should be stronger than shuff...   pass 1.000000    -0.660783 within-cell n_rad_score permutation test (500 permutati... h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d...
Pcp2 expression should align with per-cell H3K27ac only...   pass 0.004995     0.858434 seeded one-sided shuffled-cell permutation test of Spea... pcp2-expression-should-align-with-per-cell-h3k27-b0193e...

status counts:
status
pass    20

7. Evidence-ranked exploration notebook exports¶

This completed Pantheon notebook-agent batch generated 20 ideas, accepted 20 after h5cd schema review, executed 20 notebooks, and U-Chrom re-executed/verified 20/20 notebooks with explicit hypothesis tests.

Important distinction. Notebook verified means the notebook ran against the linked Takei .h5cd, passed schema/data checks, produced finite numeric output, and exposed a p-value/effect-size statistical test. It does not mean the biological hypothesis was supported. The Hypothesis evidence column below is the biological/statistical interpretation for this subset.

Current evidence summary: 6 Supported, 1 Borderline, 4 Contradicted, 9 Not supported. Rows include negative and contradicted ideas intentionally; they are part of the audit trail, not failures of execution. Graphical abstracts were generated by a separate Pantheon file-tool post-processing pass and are embedded in 20/20 notebooks.

Hypothesis evidence	Idea	Key result	Notebook
Supported Expected direction	Active chromatin assortativity between chromosomes	p = 0.001996; effect = 0.1653 one-sided chromosome-label permutation test (500 permutations) The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.	open notebook
Supported Expected direction	Granule-cell HP1alpha heterochromatin clustering in 3D traces	p = 0.001996; effect = 0.09411 one-sided matched randomization test (500 permutations; grouped by Granule trace_id and chr… The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.	open notebook
Supported Expected direction	Lamina-proximal local compaction across chromosome traces	p = 0.001996; effect = 0.02368 Spearman rank correlation; one-sided permutation test with 500 shuffles of n_per_dist withi… The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.	open notebook
Supported Expected direction	rDNA-marked inter-chromosomal hub compaction	p = 0.003322; effect = -0.2064 one-sided within-cell rDNA-label permutation test (300 permutations) on mean log(high/low m… The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.	open notebook
Supported Expected direction	Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls	p = 0.004995; effect = 0.8584 seeded one-sided shuffled-cell permutation test of Spearman rho (n_permutations=1000) The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.	open notebook
Supported Expected direction	Xist-marked chrX inter-chromosomal isolation	p = 0.01198; effect = 0.02833 within-cell Xist-label randomization test, one-sided greater, 500 permutations The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.	open notebook
Borderline Expected direction	Aldoc expression tracks lamina-associated chromatin signal	p = 0.07093; effect = 0.5356 Spearman correlation with fixed-seed one-sided permutation test (1000 Aldoc-label shuffles) The observed effect is compatible with the expected direction, but does not pass the nominal p <= 0.05 threshold.	open notebook
Contradicted Opposite direction	RNAPIISer2-P neighborhoods around polyA_RNA spots should survive cell-label negative controls	p = 0.001996; effect = 0.391 one-sided cell-label permutation test (500 permutations) The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.	open notebook
Contradicted Opposite direction	Radial enrichment of active H3K27ac chromatin	p = 0.001996; effect = -0.5242 Within-cell H3K27ac-rank permutation test (500 permutations, two-sided) The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.	open notebook
Contradicted Opposite direction	Purkinje marker expression predicts chromosome-wide radial positioning	p = 0.001998; effect = -0.8619 Spearman rank correlation with 1000 reproducible cell-label permutations The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.	open notebook
Contradicted Opposite direction	Chromosome-specific peripheral positioning by LaminB1	p = 0.005988; effect = -0.2679 within-cell chromosome-label permutation test (500 permutations, two-sided mean slope) The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.	open notebook
Not supported Expected direction	Reln expression predicts spatial coupling of H3K4me1 and CBP chromatin spots	p = 0.4156; effect = -0.1013 Spearman correlation with 1000 fixed-seed Reln-label permutations (one-sided negative) The statistical test does not support the idea in this linked Takei subset under the notebook’s operational definition.	open notebook
Not supported Opposite direction	Gabra6 expression links to elongating RNA polymerase chromatin signal	p = 0.783; effect = -0.2992 one-sided Spearman rank correlation with fixed-seed label permutation control The observed effect points opposite to the expected direction and does not provide statistical support in this subset.	open notebook
Not supported Opposite direction	H3K27ac-high loci resist spurious compaction calls after marker permutation	p = 0.8743; effect = -0.01413 within-trace H3K27ac marker permutation test (500 permutations); supplementary paired sign-… The observed effect points opposite to the expected direction and does not provide statistical support in this subset.	open notebook
Not supported Opposite direction	Bergmann-specific LaminB1 peripheral anchoring signature	p = 0.9881; effect = 0.4984 one-sided exact cell-label permutation test The observed effect points opposite to the expected direction and does not provide statistical support in this subset.	open notebook
Not supported Opposite direction	Pcp2 expression predicts H3K27ac-marked chromatin spatial clustering	p = 0.9989; effect = 0.8703 Spearman correlation, one-sided negative; 1000 deterministic label-shuffle permutations as… The observed effect points opposite to the expected direction and does not provide statistical support in this subset.	open notebook
Not supported Opposite direction	Pcp2-linked active chromatin hub proximity across cell types	p = 0.999; effect = 0.8787 one-sided Spearman permutation test (1000 label shuffles) The observed effect points opposite to the expected direction and does not provide statistical support in this subset.	open notebook
Not supported Opposite direction	H3K9me3 radial enrichment should be stronger than shuffled radial assignments	p = 1; effect = -0.6608 within-cell n_rad_score permutation test (500 permutations, one-sided) The observed effect points opposite to the expected direction and does not provide statistical support in this subset.	open notebook
Not supported Opposite direction	Lamina association of repetitive satellite-rich chromatin	p = 1; effect = 0.7121 within-cell satellite-score permutation test (500 permutations, one-sided negative) The observed effect points opposite to the expected direction and does not provide statistical support in this subset.	open notebook
Not supported Opposite direction	Purkinje-specific H3K27ac decompaction along traced chromosomes	p = 1; effect = -0.07635 one-sided label permutation test on trace/chromosome deltas, 500 permutations, statistic=me… The observed effect points opposite to the expected direction and does not provide statistical support in this subset.	open notebook

8. Look at one generated result table¶

Each accepted idea writes a small CSV table. The example below selects the verified idea with the smallest p-value in the completed batch. These tables are the starting point for later promotion into stable uchrom.fea / uchrom.strc functions.

passing = []
with open(run_result.results_path) as fh:
    for line in fh:
        item = json.loads(line)
        verification = item.get('verification') or {}
        if verification.get('status') == 'pass':
            passing.append(item)

if not passing:
    print('No passing ideas.')
else:
    def p_value_for(item):
        value = (item.get('verification') or {}).get('p_value')
        try:
            return float(value)
        except (TypeError, ValueError):
            return float('inf')

    first_pass = min(passing, key=p_value_for)
    result_path = Path(first_pass['verification']['result_path'])
    print(f'example result: {result_path.name}')
    example_df = pd.read_csv(result_path)
    print(example_df.head(8).to_string(index=False, max_colwidth=40))

example result: granule-cell-hp1alpha-heterochromatin-clustering-3401a4f59b_result.csv
                                 idea_id  n_selected_cells  n_granule_spots_finite  n_eligible_trace_chrom_groups  observed_statistic  observed_median_hp1high_distance_um  null_median_matched_distance_um  effect_size  p_value                              test_method hypothesis_test_status
granule-cell-hp1alpha-heterochromatin...                 3                   12085                             69            0.905885                             1.143732                         1.262558     0.094115 0.001996 one-sided matched randomization test ...                   pass

9. Agent backend notes¶

The batch above used idea_source='pantheon' and code_source='pantheon'. Idea generation was performed by parallel PantheonOS agents with only file access to the serialized discovery schema. Notebook exploration was performed by PantheonOS agents with file access plus live notebook tooling; the agents edited cells, inserted Markdown interpretation, generated statistical matplotlib figures, ran explicit hypothesis tests with null/alternative hypotheses, p-values, effect sizes, and test methods, executed code in notebook kernels, and U-Chrom re-executed and verified the resulting notebooks.

Matplotlib is forced to the non-interactive Agg backend in the scaffold and runner executor so documentation/batch runs render figures without opening local GUI windows.

The underlying model for this rendered example is printed in the first cell. The same runner can use any Pantheon-supported provider/model string via MODEL or the UCHROM_PANTHEON_MODEL environment variable. Graphical abstract generation is intentionally decoupled from the 20-idea verification run; it can be performed as a separate low-concurrency post-processing pass when needed.