Takei 2025 auto-discovery example¶
This notebook documents a real PantheonOS-agent-backed auto-discovery pass on the linked Takei 2025 cerebellum .h5cd / .h5ad data. It exposes the dataset schema to Pantheon idea agents, then inspects a completed 20-idea notebook-agent batch on a small Takei-derived execution subset. The batch includes live notebook construction, agent-authored exploratory analysis, explicit hypothesis testing with p-values/effect sizes, saved statistical matplotlib figures, cell execution, and U-Chrom re-execution/verification.
The rendered outputs below use OpenAI as the model provider for the PantheonOS agents in this environment. The U-Chrom runner is provider-agnostic at the Pantheon layer; change MODEL to use another Pantheon-supported model backend.
from pathlib import Path
from copy import deepcopy
import json
import os
os.environ.setdefault('MPLBACKEND', 'Agg')
import shutil
from collections import Counter
import numpy as np
import pandas as pd
try:
from loguru import logger as _pantheon_logger
_pantheon_logger.remove()
_pantheon_logger.disable('pantheon')
except Exception:
pass
from uchrom.io import load_takei2025_cerebellum
from uchrom.auto_discovery import (
DiscoveryRunConfig,
generate_pantheon_ideas,
review_idea_against_schema,
run_auto_discovery,
schema_to_agent_context,
)
from uchrom.auto_discovery.llm import DEFAULT_OPENAI_MODEL
MODEL = os.environ.get('UCHROM_PANTHEON_MODEL', f'openai/{DEFAULT_OPENAI_MODEL}')
LLM_TIMEOUT = 900
IDEA_AGENT_COUNT = 2
NOTEBOOK_AGENT_CONCURRENCY = 2
GENERATE_SCHEMATIC_IMAGE = False
SCHEMATIC_IMAGE_MODEL = os.environ.get('UCHROM_SCHEMATIC_IMAGE_MODEL', 'openai')
SCHEMATIC_IMAGE_MODEL_ARGS = {'size': '1536x1024', 'quality': 'high', 'output_format': 'png'}
def repo_root() -> Path:
for p in [Path.cwd(), *Path.cwd().parents]:
if (p / 'pyproject.toml').exists() and (p / 'uchrom').exists():
return p
return Path.cwd()
def dotenv_has_openai_key(path: Path) -> bool:
if not path.exists():
return False
for line in path.read_text().splitlines():
line = line.strip()
if line.startswith('OPENAI_API_KEY=') and line.split('=', 1)[1].strip().strip('"').strip("'"):
return True
return False
ROOT = repo_root()
TAKEI_DIR = ROOT / 'example-data' / 'takei2025_cerebellum'
OUT = ROOT / 'tmp' / 'takei_auto_discovery_doc'
OUT.mkdir(parents=True, exist_ok=True)
print(f'root: {ROOT}')
print(f'output: {OUT}')
print(f'PantheonOS model: {MODEL}')
print(f'idea agents: {IDEA_AGENT_COUNT}, notebook agent concurrency: {NOTEBOOK_AGENT_CONCURRENCY}')
print(f'schematic image generation in main exploration run: {GENERATE_SCHEMATIC_IMAGE} (post-hoc only), model={SCHEMATIC_IMAGE_MODEL}')
print(f'OpenAI key available for this rendered example: {bool(os.environ.get("OPENAI_API_KEY")) or dotenv_has_openai_key(Path.home() / ".env")}')
root: /Users/weizexu/Projects/U-Chrom
output: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc
PantheonOS model: openai/gpt-5.5
idea agents: 2, notebook agent concurrency: 2
schematic image generation in main exploration run: False (post-hoc only), model=openai
OpenAI key available for this rendered example: True
1. Load linked Takei data¶
load_takei2025_cerebellum() returns a ChromData object. The RNA expression matrix is available as cdata.linked_adata and is aligned on cell IDs.
cdata = load_takei2025_cerebellum(
replicate=1,
data_dir=TAKEI_DIR,
download=True,
)
adata = cdata.linked_adata
print(cdata)
print(f'linked_adata shape: {None if adata is None else adata.shape}')
ChromData: n_spots=10912638, n_traces=59112, n_cells=1799
spots: ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
cells: ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (1799 cells)
cellm: {'umap': (1799, 2)}
tracks: ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
traces: ['dbscan_allele', 'dbscan_ldp_allele'] (59112 traces)
uns: ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'leiden_to_cell_type', 'linked_anndata']
linked_adata: (1799, 60)
linked_adata shape: (1799, 60)
2. Build the h5cd-backed discovery schema¶
The schema is agent-readable: it summarizes axes, modalities, available fields, tracks, cell types, genes, and known missing data. Here we build it in memory for display; production runs can store it in cdata.uns['auto_discovery_schema'] and write it back to .h5cd.
schema = cdata.build_discovery_schema(
store=False,
dataset_name='takei2025_cerebellum_rep1',
max_catalog_items=80,
)
print(schema_to_agent_context(schema, max_items=16))
# ChromData discovery schema
dataset: takei2025_cerebellum_rep1
genome: mm10
xyz_unit: um
shape: 10912638 spots, 59112 traces, 1799 cells
modalities:
- chromatin_tracing: present; operations: chromosome_subset, cell_subset, trace_subset, pairwise_3d_distance, intra_chromatin_distance, inter_chromatin_distance
- if_tracks: present; operations: marker_high_low_bin_selection, marker_stratified_distance, per_cell_marker_summary, per_cell_type_marker_summary
- cell_metadata: present; operations: cell_type_stratification, embedding_visualization
- rna_expression: present; operations: gene_expression_lookup, expression_stratification, gene_marker_correlation, chromatin_expression_association
chroms: 20 [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr3, chr4, chr5, chr6 ...]
cell_types: 6 [Granule=1109, Other=323, Bergmann=192, MLI1=90, Purkinje=58, MLI2+PLI=27]
tracks: 62 [CPSF6, ATRX, H4K8ac, HDAC2, H3K9ac, H3K9me3, H3K9me2, RNAPIISer2-P, H3, H3K36me2, UBTF, LaminB1, RNAPIISer5-P, RYBP, HP1beta, RING1B ...]
linked_adata: shape=[1799, 60], X=csr_matrix
genes: 60 [Aldoc, Calb1, Cdh22, Drd3, Eomes, Ephb2, Foxj1, Gabra6, Gpr176, Grm1, Hspb1, Mrc1, Nefh, Npas3, Nptn, Olig1 ...]
known_missing:
- cellm['if_mean'] per-cell IF mean matrix
- raw RNA seqFISH spot geometry as a first-class ChromData component
- scRNA reference matrix for external expression comparison
- gene annotation cache for gene-neighborhood analyses
verification_required:
- required_fields_exist
- minimum_cell_count
- minimum_spot_or_trace_count
- finite_numeric_output
- statistical_hypothesis_test
- runtime_under_budget
- deterministic_rerun
- negative_control_or_permutation
- redundancy_against_existing_parameters
3. Generate and review Pantheon idea-agent proposals on the full dataset schema¶
This cell writes the full Takei discovery schema to disk, launches multiple PantheonOS idea agents in parallel, and reviews their returned ideas against the h5cd-backed schema. These idea agents are restricted to file access; they read schema.json and schema_context.md instead of touching notebooks.
full_schema_idea_dir = OUT / 'pantheon_full_schema_ideas'
if full_schema_idea_dir.exists():
shutil.rmtree(full_schema_idea_dir)
ideas, idea_agent_records = await generate_pantheon_ideas(
schema,
output_dir=full_schema_idea_dir,
max_ideas=2,
model=MODEL,
timeout=LLM_TIMEOUT,
idea_agent_count=IDEA_AGENT_COUNT,
)
rows = []
for idea in ideas:
review = review_idea_against_schema(idea, schema, max_complexity=5)
rows.append({
'idea_id': idea.idea_id,
'title': idea.idea_title,
'cell_types': ', '.join(idea.cell_types) or 'all',
'modalities': ' + '.join(idea.modalities),
'accepted': review.accepted,
'warnings': '; '.join(review.warnings),
})
ideas_df = pd.DataFrame(rows)
print(f'PantheonOS model: {MODEL}')
print(f'idea agent records: {len(idea_agent_records)}')
print(ideas_df[['title', 'cell_types', 'modalities', 'accepted']].to_string(index=False, max_colwidth=64))
print('\nmodality combinations:')
print(Counter(ideas_df['modalities']).most_common())
print(f"RNA-linked ideas: {ideas_df['modalities'].str.contains('rna_expression').sum()}")
PantheonOS model: openai/gpt-5.5
idea agent records: 2
title cell_types modalities accepted
Granule-cell H3K27ac radial centrality from chromatin tracing Granule chromatin_tracing + if_tracks + cell_metadata True
Purkinje Pcp2 expression predicts chromatin-associated elonga... Purkinje if_tracks + cell_metadata + rna_expression False
modality combinations:
[('chromatin_tracing + if_tracks + cell_metadata', 1), ('if_tracks + cell_metadata + rna_expression', 1)]
RNA-linked ideas: 1
4. Create a small Takei-derived h5cd for execution¶
The runner executes one notebook per accepted idea. For documentation we use a small subset sampled from the real Takei data: three cell types, three cells each, all their spots/traces, and the matching rows of linked_adata.
cell_types = ['Granule', 'Bergmann', 'Purkinje']
cells_per_type = 3
selected_cells = []
for ct in cell_types:
ids = list(cdata.cells.index[cdata.cells['cell_type'].astype(str) == ct][:cells_per_type])
selected_cells.extend(map(str, ids))
spot_mask = cdata.spots['cell_id'].astype(str).isin(selected_cells).to_numpy()
takei_small = cdata[spot_mask]
takei_small.uns = deepcopy(takei_small.uns)
cell_order = [str(x) for x in takei_small.cells.index]
adata_small = adata[cell_order].copy()
small_h5ad = OUT / 'takei_doc_auto_subset.h5ad'
small_h5cd = OUT / 'takei_doc_auto_subset.h5cd'
adata_small.write_h5ad(small_h5ad)
takei_small.linked_adata = adata_small
takei_small.uns['linked_anndata'] = {
'path': str(small_h5ad),
'n_obs': int(adata_small.n_obs),
'n_vars': int(adata_small.n_vars),
'cell_id_axis': 'obs_names',
}
takei_small.build_discovery_schema(store=True, dataset_name='takei2025_doc_subset')
takei_small.write(small_h5cd)
print(takei_small)
print(f'subset h5cd: {small_h5cd}')
print(f'subset linked_adata: {adata_small.shape}')
print(takei_small.cells['cell_type'].value_counts().to_string())
ChromData: n_spots=56036, n_traces=213, n_cells=9
spots: ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
cells: ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (9 cells)
cellm: {'umap': (9, 2)}
tracks: ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
traces: ['dbscan_allele', 'dbscan_ldp_allele'] (213 traces)
uns: ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'leiden_to_cell_type', 'linked_anndata', 'auto_discovery_schema']
linked_adata: (9, 60)
subset h5cd: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd
subset linked_adata: (9, 60)
cell_type
Granule 3
Bergmann 3
Purkinje 3
5. Run notebook-first auto-discovery¶
The rendered documentation uses a completed 20-idea Pantheon notebook-agent batch under tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg. The expensive agent run is not repeated during normal docs builds; this cell reconstructs the run summary from ideas.jsonl, reviews.jsonl, results.jsonl, and the exported notebooks.
To reproduce the batch from the saved idea set, run the CLI with a non-interactive matplotlib backend:
MPLBACKEND=Agg python -m uchrom.auto_discovery run tmp/takei_auto_discovery_doc/takei_doc_auto_subset.h5cd tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg --ideas-path tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_retry/ideas.jsonl --max-ideas 20 --max-complexity 5 --code-source pantheon --model openai/gpt-5.5 --llm-timeout 900 --notebook-agent-concurrency 4 --dataset-name takei2025_doc_subset_pantheon_20 --store-schema
Notebook agents receive file access plus live notebook tools. They read the scaffold, edit cells, insert Markdown notes, execute code cells, inspect outputs, and leave auditable notebooks behind. U-Chrom then re-executes each notebook from top to bottom and verifies that it produced a finite result, explicit hypothesis-test metadata, and a saved statistical figure.
from types import SimpleNamespace
run_dir = OUT / 'run_pantheon_20_ideas_verified_agg'
if not run_dir.exists():
raise FileNotFoundError(
f'Missing completed run directory: {run_dir}. Reproduce it with the CLI command shown above.'
)
def count_jsonl(path):
with open(path) as fh:
return sum(1 for line in fh if line.strip())
results_for_summary = []
with open(run_dir / 'results.jsonl') as fh:
for line in fh:
if line.strip():
results_for_summary.append(json.loads(line))
notebooks = sorted(str(p) for p in (run_dir / 'notebooks').glob('*.ipynb'))
run_result = SimpleNamespace(
output_dir=str(run_dir),
n_generated=count_jsonl(run_dir / 'ideas.jsonl'),
n_accepted=sum(1 for line in open(run_dir / 'reviews.jsonl') if line.strip() and json.loads(line).get('accepted')),
n_executed=len(results_for_summary),
n_verified=sum((item.get('verification') or {}).get('status') == 'pass' for item in results_for_summary),
report_path=str(run_dir / 'report.md'),
ideas_path=str(run_dir / 'ideas.jsonl'),
reviews_path=str(run_dir / 'reviews.jsonl'),
results_path=str(run_dir / 'results.jsonl'),
notebooks=notebooks,
agent_records_path=str(run_dir / 'agent_records.jsonl'),
)
print(json.dumps({
'output_dir': run_result.output_dir,
'n_generated': run_result.n_generated,
'n_accepted': run_result.n_accepted,
'n_executed': run_result.n_executed,
'n_verified': run_result.n_verified,
'notebook_count': len(run_result.notebooks),
'report_path': run_result.report_path,
}, indent=2))
{
"output_dir": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg",
"n_generated": 20,
"n_accepted": 20,
"n_executed": 20,
"n_verified": 20,
"notebook_count": 20,
"report_path": "/Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg/report.md"
}
6. Inspect verified ideas¶
The runner writes machine-readable artifacts. This table is built from the completed batch results.jsonl; status='pass' means the notebook executed and the verification cell found required fields, enough data, a finite parameter value, hypothesis-test metadata, and a statistical figure.
ideas_by_id = {}
with open(run_result.ideas_path) as fh:
for line in fh:
item = json.loads(line)
ideas_by_id[item['idea_id']] = item
result_rows = []
with open(run_result.results_path) as fh:
for line in fh:
item = json.loads(line)
verification = item.get('verification') or {}
idea = ideas_by_id.get(item['idea_id'], {})
result_rows.append({
'title': idea.get('idea_title', item['idea_id']),
'status': verification.get('status'),
'p_value': verification.get('p_value'),
'effect_size': verification.get('effect_size'),
'test_method': verification.get('test_method'),
'notebook': Path(item['notebook']).name,
})
results_df = pd.DataFrame(result_rows)
print(results_df.to_string(index=False, max_colwidth=58))
print('
status counts:')
print(results_df['status'].value_counts().to_string())
title status p_value effect_size test_method notebook
Purkinje-specific H3K27ac decompaction along traced chr... pass 1.000000 -0.076350 one-sided label permutation test on trace/chromosome de... purkinje-specific-h3k27ac-decompaction-along-tra-8f00bd...
Bergmann-specific LaminB1 peripheral anchoring signature pass 0.988095 0.498362 one-sided exact cell-label permutation test bergmann-specific-laminb1-peripheral-anchoring-s-509bbf...
Granule-cell HP1alpha heterochromatin clustering in 3D ... pass 0.001996 0.094115 one-sided matched randomization test (500 permutations;... granule-cell-hp1alpha-heterochromatin-clustering-3401a4...
Pcp2-linked active chromatin hub proximity across cell ... pass 0.999001 0.878669 one-sided Spearman permutation test (1000 label shuffles) pcp2-linked-active-chromatin-hub-proximity-acros-f62e7e...
Pcp2 expression predicts H3K27ac-marked chromatin spati... pass 0.998863 0.870301 Spearman correlation, one-sided negative; 1000 determin... pcp2-expression-predicts-h3k27ac-marked-chromati-667c6e...
Aldoc expression tracks lamina-associated chromatin signal pass 0.070929 0.535570 Spearman correlation with fixed-seed one-sided permutat... aldoc-expression-tracks-lamina-associated-chroma-4c06d0...
Gabra6 expression links to elongating RNA polymerase ch... pass 0.782968 -0.299244 one-sided Spearman rank correlation with fixed-seed lab... gabra6-expression-links-to-elongating-rna-polyme-eef7dd...
Reln expression predicts spatial coupling of H3K4me1 an... pass 0.415584 -0.101274 Spearman correlation with 1000 fixed-seed Reln-label pe... reln-expression-predicts-spatial-coupling-of-h3k-ec2890...
Lamina-proximal local compaction across chromosome traces pass 0.001996 0.023680 Spearman rank correlation; one-sided permutation test w... lamina-proximal-local-compaction-across-chromoso-ee4567...
Radial enrichment of active H3K27ac chromatin pass 0.001996 -0.524224 Within-cell H3K27ac-rank permutation test (500 permutat... radial-enrichment-of-active-h3k27ac-chromatin-65ea38bd9...
Purkinje marker expression predicts chromosome-wide rad... pass 0.001998 -0.861932 Spearman rank correlation with 1000 reproducible cell-l... purkinje-marker-expression-predicts-chromosome-w-ed7932...
Lamina association of repetitive satellite-rich chromatin pass 1.000000 0.712108 within-cell satellite-score permutation test (500 permu... lamina-association-of-repetitive-satellite-rich--6f2c68...
Xist-marked chrX inter-chromosomal isolation pass 0.011976 0.028334 within-cell Xist-label randomization test, one-sided gr... xist-marked-chrx-inter-chromosomal-isolation-3bd44d703c...
Active chromatin assortativity between chromosomes pass 0.001996 0.165338 one-sided chromosome-label permutation test (500 permut... active-chromatin-assortativity-between-chromosom-6fcdb5...
rDNA-marked inter-chromosomal hub compaction pass 0.003322 -0.206427 one-sided within-cell rDNA-label permutation test (300 ... rdna-marked-inter-chromosomal-hub-compaction-41a5f6a09d...
Chromosome-specific peripheral positioning by LaminB1 pass 0.005988 -0.267949 within-cell chromosome-label permutation test (500 perm... chromosome-specific-peripheral-positioning-by-la-8abdde...
H3K27ac-high loci resist spurious compaction calls afte... pass 0.874251 -0.014128 within-trace H3K27ac marker permutation test (500 permu... h3k27ac-high-loci-resist-spurious-compaction-cal-b54d7c...
RNAPIISer2-P neighborhoods around polyA_RNA spots shoul... pass 0.001996 0.391011 one-sided cell-label permutation test (500 permutations) rnapiiser2-p-neighborhoods-around-polya-rna-spot-6bd867...
H3K9me3 radial enrichment should be stronger than shuff... pass 1.000000 -0.660783 within-cell n_rad_score permutation test (500 permutati... h3k9me3-radial-enrichment-should-be-stronger-tha-b9769d...
Pcp2 expression should align with per-cell H3K27ac only... pass 0.004995 0.858434 seeded one-sided shuffled-cell permutation test of Spea... pcp2-expression-should-align-with-per-cell-h3k27-b0193e...
status counts:
status
pass 20
7. Evidence-ranked exploration notebook exports¶
This completed Pantheon notebook-agent batch generated 20 ideas, accepted 20 after h5cd schema review, executed 20 notebooks, and U-Chrom re-executed/verified 20/20 notebooks with explicit hypothesis tests.
Important distinction. Notebook verified means the notebook ran against the linked Takei .h5cd, passed schema/data checks, produced finite numeric output, and exposed a p-value/effect-size statistical test. It does not mean the biological hypothesis was supported. The Hypothesis evidence column below is the biological/statistical interpretation for this subset.
Current evidence summary: 6 Supported, 1 Borderline, 4 Contradicted, 9 Not supported. Rows include negative and contradicted ideas intentionally; they are part of the audit trail, not failures of execution. Graphical abstracts were generated by a separate Pantheon file-tool post-processing pass and are embedded in 20/20 notebooks.
Hypothesis evidence |
Idea |
Key result |
Notebook |
|---|---|---|---|
Supported |
Active chromatin assortativity between chromosomes |
p = 0.001996; effect = 0.1653 |
|
Supported |
Granule-cell HP1alpha heterochromatin clustering in 3D traces |
p = 0.001996; effect = 0.09411 |
|
Supported |
Lamina-proximal local compaction across chromosome traces |
p = 0.001996; effect = 0.02368 |
|
Supported |
rDNA-marked inter-chromosomal hub compaction |
p = 0.003322; effect = -0.2064 |
|
Supported |
Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls |
p = 0.004995; effect = 0.8584 |
|
Supported |
Xist-marked chrX inter-chromosomal isolation |
p = 0.01198; effect = 0.02833 |
|
Borderline |
Aldoc expression tracks lamina-associated chromatin signal |
p = 0.07093; effect = 0.5356 |
|
Contradicted |
RNAPIISer2-P neighborhoods around polyA_RNA spots should survive cell-label negative controls |
p = 0.001996; effect = 0.391 |
|
Contradicted |
Radial enrichment of active H3K27ac chromatin |
p = 0.001996; effect = -0.5242 |
|
Contradicted |
Purkinje marker expression predicts chromosome-wide radial positioning |
p = 0.001998; effect = -0.8619 |
|
Contradicted |
Chromosome-specific peripheral positioning by LaminB1 |
p = 0.005988; effect = -0.2679 |
|
Not supported |
Reln expression predicts spatial coupling of H3K4me1 and CBP chromatin spots |
p = 0.4156; effect = -0.1013 |
|
Not supported |
Gabra6 expression links to elongating RNA polymerase chromatin signal |
p = 0.783; effect = -0.2992 |
|
Not supported |
H3K27ac-high loci resist spurious compaction calls after marker permutation |
p = 0.8743; effect = -0.01413 |
|
Not supported |
Bergmann-specific LaminB1 peripheral anchoring signature |
p = 0.9881; effect = 0.4984 |
|
Not supported |
Pcp2 expression predicts H3K27ac-marked chromatin spatial clustering |
p = 0.9989; effect = 0.8703 |
|
Not supported |
Pcp2-linked active chromatin hub proximity across cell types |
p = 0.999; effect = 0.8787 |
|
Not supported |
H3K9me3 radial enrichment should be stronger than shuffled radial assignments |
p = 1; effect = -0.6608 |
|
Not supported |
Lamina association of repetitive satellite-rich chromatin |
p = 1; effect = 0.7121 |
|
Not supported |
Purkinje-specific H3K27ac decompaction along traced chromosomes |
p = 1; effect = -0.07635 |
8. Look at one generated result table¶
Each accepted idea writes a small CSV table. The example below selects the verified idea with the smallest p-value in the completed batch. These tables are the starting point for later promotion into stable uchrom.fea / uchrom.strc functions.
passing = []
with open(run_result.results_path) as fh:
for line in fh:
item = json.loads(line)
verification = item.get('verification') or {}
if verification.get('status') == 'pass':
passing.append(item)
if not passing:
print('No passing ideas.')
else:
def p_value_for(item):
value = (item.get('verification') or {}).get('p_value')
try:
return float(value)
except (TypeError, ValueError):
return float('inf')
first_pass = min(passing, key=p_value_for)
result_path = Path(first_pass['verification']['result_path'])
print(f'example result: {result_path.name}')
example_df = pd.read_csv(result_path)
print(example_df.head(8).to_string(index=False, max_colwidth=40))
example result: granule-cell-hp1alpha-heterochromatin-clustering-3401a4f59b_result.csv
idea_id n_selected_cells n_granule_spots_finite n_eligible_trace_chrom_groups observed_statistic observed_median_hp1high_distance_um null_median_matched_distance_um effect_size p_value test_method hypothesis_test_status
granule-cell-hp1alpha-heterochromatin... 3 12085 69 0.905885 1.143732 1.262558 0.094115 0.001996 one-sided matched randomization test ... pass
9. Agent backend notes¶
The batch above used idea_source='pantheon' and code_source='pantheon'. Idea generation was performed by parallel PantheonOS agents with only file access to the serialized discovery schema. Notebook exploration was performed by PantheonOS agents with file access plus live notebook tooling; the agents edited cells, inserted Markdown interpretation, generated statistical matplotlib figures, ran explicit hypothesis tests with null/alternative hypotheses, p-values, effect sizes, and test methods, executed code in notebook kernels, and U-Chrom re-executed and verified the resulting notebooks.
Matplotlib is forced to the non-interactive Agg backend in the scaffold and runner executor so documentation/batch runs render figures without opening local GUI windows.
The underlying model for this rendered example is printed in the first cell. The same runner can use any Pantheon-supported provider/model string via MODEL or the UCHROM_PANTHEON_MODEL environment variable. Graphical abstract generation is intentionally decoupled from the 20-idea verification run; it can be performed as a separate low-concurrency post-processing pass when needed.