Takei 2025 auto-discovery example

This notebook documents a real PantheonOS-agent-backed auto-discovery pass on the linked Takei 2025 cerebellum .h5cd / .h5ad data. It exposes the dataset schema to Pantheon idea agents, then inspects a completed 20-idea notebook-agent batch on a small Takei-derived execution subset. The batch includes live notebook construction, agent-authored exploratory analysis, explicit hypothesis testing with p-values/effect sizes, saved statistical matplotlib figures, cell execution, and U-Chrom re-execution/verification.

The rendered outputs below use OpenAI as the model provider for the PantheonOS agents in this environment. The U-Chrom runner is provider-agnostic at the Pantheon layer; change MODEL to use another Pantheon-supported model backend.

from pathlib import Path
from copy import deepcopy
import json
import os
os.environ.setdefault('MPLBACKEND', 'Agg')
import shutil
from collections import Counter

import numpy as np
import pandas as pd


try:
    from loguru import logger as _pantheon_logger
    _pantheon_logger.remove()
    _pantheon_logger.disable('pantheon')
except Exception:
    pass

from uchrom.io import load_takei2025_cerebellum
from uchrom.auto_discovery import (
    DiscoveryRunConfig,
    generate_pantheon_ideas,
    review_idea_against_schema,
    run_auto_discovery,
    schema_to_agent_context,
)
from uchrom.auto_discovery.llm import DEFAULT_OPENAI_MODEL

MODEL = os.environ.get('UCHROM_PANTHEON_MODEL', f'openai/{DEFAULT_OPENAI_MODEL}')
LLM_TIMEOUT = 900
IDEA_AGENT_COUNT = 2
NOTEBOOK_AGENT_CONCURRENCY = 2
GENERATE_SCHEMATIC_IMAGE = False
SCHEMATIC_IMAGE_MODEL = os.environ.get('UCHROM_SCHEMATIC_IMAGE_MODEL', 'openai')
SCHEMATIC_IMAGE_MODEL_ARGS = {'size': '1536x1024', 'quality': 'high', 'output_format': 'png'}


def repo_root() -> Path:
    for p in [Path.cwd(), *Path.cwd().parents]:
        if (p / 'pyproject.toml').exists() and (p / 'uchrom').exists():
            return p
    return Path.cwd()


def dotenv_has_openai_key(path: Path) -> bool:
    if not path.exists():
        return False
    for line in path.read_text().splitlines():
        line = line.strip()
        if line.startswith('OPENAI_API_KEY=') and line.split('=', 1)[1].strip().strip('"').strip("'"):
            return True
    return False

ROOT = repo_root()
TAKEI_DIR = ROOT / 'example-data' / 'takei2025_cerebellum'
OUT = ROOT / 'tmp' / 'takei_auto_discovery_doc'
OUT.mkdir(parents=True, exist_ok=True)
print(f'root: {ROOT}')
print(f'output: {OUT}')
print(f'PantheonOS model: {MODEL}')
print(f'idea agents: {IDEA_AGENT_COUNT}, notebook agent concurrency: {NOTEBOOK_AGENT_CONCURRENCY}')
print(f'schematic image generation in main exploration run: {GENERATE_SCHEMATIC_IMAGE} (post-hoc only), model={SCHEMATIC_IMAGE_MODEL}')
print(f'OpenAI key available for this rendered example: {bool(os.environ.get("OPENAI_API_KEY")) or dotenv_has_openai_key(Path.home() / ".env")}')
root: /Users/weizexu/Projects/U-Chrom
output: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc
PantheonOS model: openai/gpt-5.5
idea agents: 2, notebook agent concurrency: 2
schematic image generation in main exploration run: False (post-hoc only), model=openai
OpenAI key available for this rendered example: True

1. Load linked Takei data

load_takei2025_cerebellum() returns a ChromData object. The RNA expression matrix is available as cdata.linked_adata and is aligned on cell IDs.

cdata = load_takei2025_cerebellum(
    replicate=1,
    data_dir=TAKEI_DIR,
    download=True,
)
adata = cdata.linked_adata
print(cdata)
print(f'linked_adata shape: {None if adata is None else adata.shape}')
ChromData: n_spots=10912638, n_traces=59112, n_cells=1799
  spots:   ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
  cells:   ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (1799 cells)
  cellm:   {'umap': (1799, 2)}
  tracks:  ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
  traces:  ['dbscan_allele', 'dbscan_ldp_allele'] (59112 traces)
  uns:     ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'leiden_to_cell_type', 'linked_anndata']
  linked_adata: (1799, 60)
linked_adata shape: (1799, 60)

2. Build the h5cd-backed discovery schema

The schema is agent-readable: it summarizes axes, modalities, available fields, tracks, cell types, genes, and known missing data. Here we build it in memory for display; production runs can store it in cdata.uns['auto_discovery_schema'] and write it back to .h5cd.

schema = cdata.build_discovery_schema(
    store=False,
    dataset_name='takei2025_cerebellum_rep1',
    max_catalog_items=80,
)
print(schema_to_agent_context(schema, max_items=16))
# ChromData discovery schema

dataset: takei2025_cerebellum_rep1
genome: mm10
xyz_unit: um
shape: 10912638 spots, 59112 traces, 1799 cells

modalities:
- chromatin_tracing: present; operations: chromosome_subset, cell_subset, trace_subset, pairwise_3d_distance, intra_chromatin_distance, inter_chromatin_distance
- if_tracks: present; operations: marker_high_low_bin_selection, marker_stratified_distance, per_cell_marker_summary, per_cell_type_marker_summary
- cell_metadata: present; operations: cell_type_stratification, embedding_visualization
- rna_expression: present; operations: gene_expression_lookup, expression_stratification, gene_marker_correlation, chromatin_expression_association

chroms: 20 [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr3, chr4, chr5, chr6 ...]
cell_types: 6 [Granule=1109, Other=323, Bergmann=192, MLI1=90, Purkinje=58, MLI2+PLI=27]
tracks: 62 [CPSF6, ATRX, H4K8ac, HDAC2, H3K9ac, H3K9me3, H3K9me2, RNAPIISer2-P, H3, H3K36me2, UBTF, LaminB1, RNAPIISer5-P, RYBP, HP1beta, RING1B ...]
linked_adata: shape=[1799, 60], X=csr_matrix
genes: 60 [Aldoc, Calb1, Cdh22, Drd3, Eomes, Ephb2, Foxj1, Gabra6, Gpr176, Grm1, Hspb1, Mrc1, Nefh, Npas3, Nptn, Olig1 ...]

known_missing:
- cellm['if_mean'] per-cell IF mean matrix
- raw RNA seqFISH spot geometry as a first-class ChromData component
- scRNA reference matrix for external expression comparison
- gene annotation cache for gene-neighborhood analyses

verification_required:
- required_fields_exist
- minimum_cell_count
- minimum_spot_or_trace_count
- finite_numeric_output
- statistical_hypothesis_test
- runtime_under_budget
- deterministic_rerun
- negative_control_or_permutation
- redundancy_against_existing_parameters

3. Generate and review Pantheon idea-agent proposals on the full dataset schema

This cell writes the full Takei discovery schema to disk, launches multiple PantheonOS idea agents in parallel, and reviews their returned ideas against the h5cd-backed schema. These idea agents are restricted to file access; they read schema.json and schema_context.md instead of touching notebooks.

full_schema_idea_dir = OUT / 'pantheon_full_schema_ideas'
if full_schema_idea_dir.exists():
    shutil.rmtree(full_schema_idea_dir)

ideas, idea_agent_records = await generate_pantheon_ideas(
    schema,
    output_dir=full_schema_idea_dir,
    max_ideas=2,
    model=MODEL,
    timeout=LLM_TIMEOUT,
    idea_agent_count=IDEA_AGENT_COUNT,
)
rows = []
for idea in ideas:
    review = review_idea_against_schema(idea, schema, max_complexity=5)
    rows.append({
        'idea_id': idea.idea_id,
        'title': idea.idea_title,
        'cell_types': ', '.join(idea.cell_types) or 'all',
        'modalities': ' + '.join(idea.modalities),
        'accepted': review.accepted,
        'warnings': '; '.join(review.warnings),
    })
ideas_df = pd.DataFrame(rows)
print(f'PantheonOS model: {MODEL}')
print(f'idea agent records: {len(idea_agent_records)}')
print(ideas_df[['title', 'cell_types', 'modalities', 'accepted']].to_string(index=False, max_colwidth=64))
print('\nmodality combinations:')
print(Counter(ideas_df['modalities']).most_common())
print(f"RNA-linked ideas: {ideas_df['modalities'].str.contains('rna_expression').sum()}")
PantheonOS model: openai/gpt-5.5
idea agent records: 2
                                                           title cell_types                                    modalities  accepted
   Granule-cell H3K27ac radial centrality from chromatin tracing    Granule chromatin_tracing + if_tracks + cell_metadata      True
Purkinje Pcp2 expression predicts chromatin-associated elonga...   Purkinje    if_tracks + cell_metadata + rna_expression     False

modality combinations:
[('chromatin_tracing + if_tracks + cell_metadata', 1), ('if_tracks + cell_metadata + rna_expression', 1)]
RNA-linked ideas: 1

4. Create a small Takei-derived h5cd for execution

The runner executes one notebook per accepted idea. For documentation we use a small subset sampled from the real Takei data: three cell types, three cells each, all their spots/traces, and the matching rows of linked_adata.

cell_types = ['Granule', 'Bergmann', 'Purkinje']
cells_per_type = 3
selected_cells = []
for ct in cell_types:
    ids = list(cdata.cells.index[cdata.cells['cell_type'].astype(str) == ct][:cells_per_type])
    selected_cells.extend(map(str, ids))

spot_mask = cdata.spots['cell_id'].astype(str).isin(selected_cells).to_numpy()
takei_small = cdata[spot_mask]
takei_small.uns = deepcopy(takei_small.uns)

cell_order = [str(x) for x in takei_small.cells.index]
adata_small = adata[cell_order].copy()
small_h5ad = OUT / 'takei_doc_auto_subset.h5ad'
small_h5cd = OUT / 'takei_doc_auto_subset.h5cd'
adata_small.write_h5ad(small_h5ad)
takei_small.linked_adata = adata_small
takei_small.uns['linked_anndata'] = {
    'path': str(small_h5ad),
    'n_obs': int(adata_small.n_obs),
    'n_vars': int(adata_small.n_vars),
    'cell_id_axis': 'obs_names',
}
takei_small.build_discovery_schema(store=True, dataset_name='takei2025_doc_subset')
takei_small.write(small_h5cd)

print(takei_small)
print(f'subset h5cd: {small_h5cd}')
print(f'subset linked_adata: {adata_small.shape}')
print(takei_small.cells['cell_type'].value_counts().to_string())
ChromData: n_spots=56036, n_traces=213, n_cells=9
  spots:   ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
  cells:   ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (9 cells)
  cellm:   {'umap': (9, 2)}
  tracks:  ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
  traces:  ['dbscan_allele', 'dbscan_ldp_allele'] (213 traces)
  uns:     ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'leiden_to_cell_type', 'linked_anndata', 'auto_discovery_schema']
  linked_adata: (9, 60)
subset h5cd: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd
subset linked_adata: (9, 60)
cell_type
Granule     3
Bergmann    3
Purkinje    3

5. Run notebook-first auto-discovery

The rendered documentation uses a completed 20-idea Pantheon notebook-agent batch under tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg. The expensive agent run is not repeated during normal docs builds; this cell reconstructs the run summary from ideas.jsonl, reviews.jsonl, results.jsonl, and the exported notebooks.

To reproduce the batch from the saved idea set, run the CLI with a non-interactive matplotlib backend:

MPLBACKEND=Agg python -m uchrom.auto_discovery run   tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd   tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg   --ideas-path tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_retry/ideas.jsonl   --max-ideas 20   --max-complexity 5   --code-source pantheon   --model openai/gpt-5.5   --llm-timeout 900   --notebook-agent-concurrency 4   --dataset-name takei2025_doc_subset_pantheon_20   --store-schema

Notebook agents receive file access plus live notebook tools. They read the scaffold, edit cells, insert Markdown notes, execute code cells, inspect outputs, and leave auditable notebooks behind. U-Chrom then re-executes each notebook from top to bottom and verifies that it produced a finite result, explicit hypothesis-test metadata, and a saved statistical figure.

from types import SimpleNamespace

run_dir = OUT / 'run_pantheon_20_ideas_verified_agg'
if not run_dir.exists():
    raise FileNotFoundError(
        f'Missing completed run directory: {run_dir}. Reproduce it with the CLI command shown above.'
    )


def count_jsonl(path):
    with open(path) as fh:
        return sum(1 for line in fh if line.strip())

results_for_summary = []
with open(run_dir / 'results.jsonl') as fh:
    for line in fh:
        if line.strip():
            results_for_summary.append(json.loads(line))

notebooks = sorted(str(p) for p in (run_dir / 'notebooks').glob('*.ipynb'))
run_result = SimpleNamespace(
    output_dir=str(run_dir),
    n_generated=count_jsonl(run_dir / 'ideas.jsonl'),
    n_accepted=sum(1 for line in open(run_dir / 'reviews.jsonl') if line.strip() and json.loads(line).get('accepted')),
    n_executed=len(results_for_summary),
    n_verified=sum((item.get('verification') or {}).get('status') == 'pass' for item in results_for_summary),
    report_path=str(run_dir / 'report.md'),
    ideas_path=str(run_dir / 'ideas.jsonl'),
    reviews_path=str(run_dir / 'reviews.jsonl'),
    results_path=str(run_dir / 'results.jsonl'),
    notebooks=notebooks,
    agent_records_path=str(run_dir / 'agent_records.jsonl'),
)
print(json.dumps({
    'output_dir': run_result.output_dir,
    'n_generated': run_result.n_generated,
    'n_accepted': run_result.n_accepted,
    'n_executed': run_result.n_executed,
    'n_verified': run_result.n_verified,
    'notebook_count': len(run_result.notebooks),
    'report_path': run_result.report_path,
}, indent=2))
{
  "output_dir": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg",
  "n_generated": 20,
  "n_accepted": 20,
  "n_executed": 20,
  "n_verified": 20,
  "notebook_count": 20,
  "report_path": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/report.md"
}

6. Inspect verified ideas

The runner writes machine-readable artifacts. This table is built from the completed batch results.jsonl; status='pass' means the notebook executed and the verification cell found required fields, enough data, a finite parameter value, hypothesis-test metadata, and a statistical figure.

ideas_by_id = {}
with open(run_result.ideas_path) as fh:
    for line in fh:
        item = json.loads(line)
        ideas_by_id[item['idea_id']] = item

result_rows = []
with open(run_result.results_path) as fh:
    for line in fh:
        item = json.loads(line)
        verification = item.get('verification') or {}
        idea = ideas_by_id.get(item['idea_id'], {})
        result_rows.append({
            'title': idea.get('idea_title', item['idea_id']),
            'status': verification.get('status'),
            'p_value': verification.get('p_value'),
            'effect_size': verification.get('effect_size'),
            'test_method': verification.get('test_method'),
            'notebook': Path(item['notebook']).name,
        })
results_df = pd.DataFrame(result_rows)
print(results_df.to_string(index=False, max_colwidth=58))
print('
status counts:')
print(results_df['status'].value_counts().to_string())
                                                     title status  p_value  effect_size                                                test_method                                                   notebook
Purkinje-specific H3K27ac decompaction along traced chr...   pass 1.000000    -0.076350 one-sided label permutation test on trace/chromosome de... purkinje-specific-h3k27ac-decompaction-along-tra-8f00bd...
  Bergmann-specific LaminB1 peripheral anchoring signature   pass 0.988095     0.498362                one-sided exact cell-label permutation test bergmann-specific-laminb1-peripheral-anchoring-s-509bbf...
Granule-cell HP1alpha heterochromatin clustering in 3D ...   pass 0.001996     0.094115 one-sided matched randomization test (500 permutations;... granule-cell-hp1alpha-heterochromatin-clustering-3401a4...
Pcp2-linked active chromatin hub proximity across cell ...   pass 0.999001     0.878669  one-sided Spearman permutation test (1000 label shuffles) pcp2-linked-active-chromatin-hub-proximity-acros-f62e7e...
Pcp2 expression predicts H3K27ac-marked chromatin spati...   pass 0.998863     0.870301 Spearman correlation, one-sided negative; 1000 determin... pcp2-expression-predicts-h3k27ac-marked-chromati-667c6e...
Aldoc expression tracks lamina-associated chromatin signal   pass 0.070929     0.535570 Spearman correlation with fixed-seed one-sided permutat... aldoc-expression-tracks-lamina-associated-chroma-4c06d0...
Gabra6 expression links to elongating RNA polymerase ch...   pass 0.782968    -0.299244 one-sided Spearman rank correlation with fixed-seed lab... gabra6-expression-links-to-elongating-rna-polyme-eef7dd...
Reln expression predicts spatial coupling of H3K4me1 an...   pass 0.415584    -0.101274 Spearman correlation with 1000 fixed-seed Reln-label pe... reln-expression-predicts-spatial-coupling-of-h3k-ec2890...
 Lamina-proximal local compaction across chromosome traces   pass 0.001996     0.023680 Spearman rank correlation; one-sided permutation test w... lamina-proximal-local-compaction-across-chromoso-ee4567...
             Radial enrichment of active H3K27ac chromatin   pass 0.001996    -0.524224 Within-cell H3K27ac-rank permutation test (500 permutat... radial-enrichment-of-active-h3k27ac-chromatin-65ea38bd9...
Purkinje marker expression predicts chromosome-wide rad...   pass 0.001998    -0.861932 Spearman rank correlation with 1000 reproducible cell-l... purkinje-marker-expression-predicts-chromosome-w-ed7932...
 Lamina association of repetitive satellite-rich chromatin   pass 1.000000     0.712108 within-cell satellite-score permutation test (500 permu... lamina-association-of-repetitive-satellite-rich--6f2c68...
              Xist-marked chrX inter-chromosomal isolation   pass 0.011976     0.028334 within-cell Xist-label randomization test, one-sided gr... xist-marked-chrx-inter-chromosomal-isolation-3bd44d703c...
        Active chromatin assortativity between chromosomes   pass 0.001996     0.165338 one-sided chromosome-label permutation test (500 permut... active-chromatin-assortativity-between-chromosom-6fcdb5...
              rDNA-marked inter-chromosomal hub compaction   pass 0.003322    -0.206427 one-sided within-cell rDNA-label permutation test (300 ... rdna-marked-inter-chromosomal-hub-compaction-41a5f6a09d...
     Chromosome-specific peripheral positioning by LaminB1   pass 0.005988    -0.267949 within-cell chromosome-label permutation test (500 perm... chromosome-specific-peripheral-positioning-by-la-8abdde...
H3K27ac-high loci resist spurious compaction calls afte...   pass 0.874251    -0.014128 within-trace H3K27ac marker permutation test (500 permu... h3k27ac-high-loci-resist-spurious-compaction-cal-b54d7c...
RNAPIISer2-P neighborhoods around polyA_RNA spots shoul...   pass 0.001996     0.391011   one-sided cell-label permutation test (500 permutations) rnapiiser2-p-neighborhoods-around-polya-rna-spot-6bd867...
H3K9me3 radial enrichment should be stronger than shuff...   pass 1.000000    -0.660783 within-cell n_rad_score permutation test (500 permutati... h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d...
Pcp2 expression should align with per-cell H3K27ac only...   pass 0.004995     0.858434 seeded one-sided shuffled-cell permutation test of Spea... pcp2-expression-should-align-with-per-cell-h3k27-b0193e...

status counts:
status
pass    20

7. Evidence-ranked exploration notebook exports

This completed Pantheon notebook-agent batch generated 20 ideas, accepted 20 after h5cd schema review, executed 20 notebooks, and U-Chrom re-executed/verified 20/20 notebooks with explicit hypothesis tests.

Important distinction. Notebook verified means the notebook ran against the linked Takei .h5cd, passed schema/data checks, produced finite numeric output, and exposed a p-value/effect-size statistical test. It does not mean the biological hypothesis was supported. The Hypothesis evidence column below is the biological/statistical interpretation for this subset.

Current evidence summary: 6 Supported, 1 Borderline, 4 Contradicted, 9 Not supported. Rows include negative and contradicted ideas intentionally; they are part of the audit trail, not failures of execution. Graphical abstracts were generated by a separate Pantheon file-tool post-processing pass and are embedded in 20/20 notebooks.

Hypothesis evidence

Idea

Key result

Notebook

Supported
Expected direction

Active chromatin assortativity between chromosomes

p = 0.001996; effect = 0.1653
one-sided chromosome-label permutation test (500 permutations)
The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.

open notebook

Supported
Expected direction

Granule-cell HP1alpha heterochromatin clustering in 3D traces

p = 0.001996; effect = 0.09411
one-sided matched randomization test (500 permutations; grouped by Granule trace_id and chr…
The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.

open notebook

Supported
Expected direction

Lamina-proximal local compaction across chromosome traces

p = 0.001996; effect = 0.02368
Spearman rank correlation; one-sided permutation test with 500 shuffles of n_per_dist withi…
The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.

open notebook

Supported
Expected direction

rDNA-marked inter-chromosomal hub compaction

p = 0.003322; effect = -0.2064
one-sided within-cell rDNA-label permutation test (300 permutations) on mean log(high/low m…
The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.

open notebook

Supported
Expected direction

Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls

p = 0.004995; effect = 0.8584
seeded one-sided shuffled-cell permutation test of Spearman rho (n_permutations=1000)
The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.

open notebook

Supported
Expected direction

Xist-marked chrX inter-chromosomal isolation

p = 0.01198; effect = 0.02833
within-cell Xist-label randomization test, one-sided greater, 500 permutations
The observed effect is consistent with the expected direction and passes the nominal p <= 0.05 threshold.

open notebook

Borderline
Expected direction

Aldoc expression tracks lamina-associated chromatin signal

p = 0.07093; effect = 0.5356
Spearman correlation with fixed-seed one-sided permutation test (1000 Aldoc-label shuffles)
The observed effect is compatible with the expected direction, but does not pass the nominal p <= 0.05 threshold.

open notebook

Contradicted
Opposite direction

RNAPIISer2-P neighborhoods around polyA_RNA spots should survive cell-label negative controls

p = 0.001996; effect = 0.391
one-sided cell-label permutation test (500 permutations)
The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.

open notebook

Contradicted
Opposite direction

Radial enrichment of active H3K27ac chromatin

p = 0.001996; effect = -0.5242
Within-cell H3K27ac-rank permutation test (500 permutations, two-sided)
The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.

open notebook

Contradicted
Opposite direction

Purkinje marker expression predicts chromosome-wide radial positioning

p = 0.001998; effect = -0.8619
Spearman rank correlation with 1000 reproducible cell-label permutations
The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.

open notebook

Contradicted
Opposite direction

Chromosome-specific peripheral positioning by LaminB1

p = 0.005988; effect = -0.2679
within-cell chromosome-label permutation test (500 permutations, two-sided mean slope)
The hypothesis test is significant, but the observed effect is in the opposite direction from the idea.

open notebook

Not supported
Expected direction

Reln expression predicts spatial coupling of H3K4me1 and CBP chromatin spots

p = 0.4156; effect = -0.1013
Spearman correlation with 1000 fixed-seed Reln-label permutations (one-sided negative)
The statistical test does not support the idea in this linked Takei subset under the notebook’s operational definition.

open notebook

Not supported
Opposite direction

Gabra6 expression links to elongating RNA polymerase chromatin signal

p = 0.783; effect = -0.2992
one-sided Spearman rank correlation with fixed-seed label permutation control
The observed effect points opposite to the expected direction and does not provide statistical support in this subset.

open notebook

Not supported
Opposite direction

H3K27ac-high loci resist spurious compaction calls after marker permutation

p = 0.8743; effect = -0.01413
within-trace H3K27ac marker permutation test (500 permutations); supplementary paired sign-…
The observed effect points opposite to the expected direction and does not provide statistical support in this subset.

open notebook

Not supported
Opposite direction

Bergmann-specific LaminB1 peripheral anchoring signature

p = 0.9881; effect = 0.4984
one-sided exact cell-label permutation test
The observed effect points opposite to the expected direction and does not provide statistical support in this subset.

open notebook

Not supported
Opposite direction

Pcp2 expression predicts H3K27ac-marked chromatin spatial clustering

p = 0.9989; effect = 0.8703
Spearman correlation, one-sided negative; 1000 deterministic label-shuffle permutations as…
The observed effect points opposite to the expected direction and does not provide statistical support in this subset.

open notebook

Not supported
Opposite direction

Pcp2-linked active chromatin hub proximity across cell types

p = 0.999; effect = 0.8787
one-sided Spearman permutation test (1000 label shuffles)
The observed effect points opposite to the expected direction and does not provide statistical support in this subset.

open notebook

Not supported
Opposite direction

H3K9me3 radial enrichment should be stronger than shuffled radial assignments

p = 1; effect = -0.6608
within-cell n_rad_score permutation test (500 permutations, one-sided)
The observed effect points opposite to the expected direction and does not provide statistical support in this subset.

open notebook

Not supported
Opposite direction

Lamina association of repetitive satellite-rich chromatin

p = 1; effect = 0.7121
within-cell satellite-score permutation test (500 permutations, one-sided negative)
The observed effect points opposite to the expected direction and does not provide statistical support in this subset.

open notebook

Not supported
Opposite direction

Purkinje-specific H3K27ac decompaction along traced chromosomes

p = 1; effect = -0.07635
one-sided label permutation test on trace/chromosome deltas, 500 permutations, statistic=me…
The observed effect points opposite to the expected direction and does not provide statistical support in this subset.

open notebook

8. Look at one generated result table

Each accepted idea writes a small CSV table. The example below selects the verified idea with the smallest p-value in the completed batch. These tables are the starting point for later promotion into stable uchrom.fea / uchrom.strc functions.

passing = []
with open(run_result.results_path) as fh:
    for line in fh:
        item = json.loads(line)
        verification = item.get('verification') or {}
        if verification.get('status') == 'pass':
            passing.append(item)

if not passing:
    print('No passing ideas.')
else:
    def p_value_for(item):
        value = (item.get('verification') or {}).get('p_value')
        try:
            return float(value)
        except (TypeError, ValueError):
            return float('inf')

    first_pass = min(passing, key=p_value_for)
    result_path = Path(first_pass['verification']['result_path'])
    print(f'example result: {result_path.name}')
    example_df = pd.read_csv(result_path)
    print(example_df.head(8).to_string(index=False, max_colwidth=40))
example result: granule-cell-hp1alpha-heterochromatin-clustering-3401a4f59b_result.csv
                                 idea_id  n_selected_cells  n_granule_spots_finite  n_eligible_trace_chrom_groups  observed_statistic  observed_median_hp1high_distance_um  null_median_matched_distance_um  effect_size  p_value                              test_method hypothesis_test_status
granule-cell-hp1alpha-heterochromatin...                 3                   12085                             69            0.905885                             1.143732                         1.262558     0.094115 0.001996 one-sided matched randomization test ...                   pass

9. Agent backend notes

The batch above used idea_source='pantheon' and code_source='pantheon'. Idea generation was performed by parallel PantheonOS agents with only file access to the serialized discovery schema. Notebook exploration was performed by PantheonOS agents with file access plus live notebook tooling; the agents edited cells, inserted Markdown interpretation, generated statistical matplotlib figures, ran explicit hypothesis tests with null/alternative hypotheses, p-values, effect sizes, and test methods, executed code in notebook kernels, and U-Chrom re-executed and verified the resulting notebooks.

Matplotlib is forced to the non-interactive Agg backend in the scaffold and runner executor so documentation/batch runs render figures without opening local GUI windows.

The underlying model for this rendered example is printed in the first cell. The same runner can use any Pantheon-supported provider/model string via MODEL or the UCHROM_PANTHEON_MODEL environment variable. Graphical abstract generation is intentionally decoupled from the 20-idea verification run; it can be performed as a separate low-concurrency post-processing pass when needed.