Multi-omics Auto-discovery¶

U-Chrom can expose an agent-readable schema directly from a .h5cd file. The schema is stored in cdata.uns["auto_discovery_schema"], so discovery agents can reason from the same object that downstream analyses load.

Schema¶

from uchrom import ChromData

cdata = ChromData.read("takei2025_cerebellum_rep1.h5cd")
cdata.add_reference(
    role="primary_dataset_paper",
    title="Primary paper or dataset manuscript",
    doi="10.xxxx/example",
    url="https://example.org/paper",
)
cdata.add_user_annotation(
    scope="analysis_constraint",
    target="linked_adata",
    text="RNA and chromatin tracing are linked at cell_id level; do not assume RNA spot geometry.",
    tags=["constraint", "multiomics_alignment"],
)
schema = cdata.build_discovery_schema(store=True)
print(cdata.describe_for_agent())
cdata.write("takei2025_cerebellum_rep1.h5cd")

The schema records:

axes: spot, trace, cell, marker, gene
modalities: chromatin tracing, IF tracks, cell metadata, linked RNA
field catalogs: spots, tracks, cells, cellm, linked_adata
dataset references from cdata.uns["dataset_references"]
user annotations from cdata.uns["user_annotations"]
a compact knowledge_seed_context for browser/idea agents
known missing data, such as raw RNA spot geometry or scRNA reference
recommended verification checks for generated parameters

References and annotations are treated as priors and constraints for discovery agents. They do not replace notebook-based validation.

Runner¶

The local runner performs:

h5cd schema -> idea generation -> schema review -> notebook creation
-> notebook execution -> verification -> report

Run the default PantheonOS-agent backend:

MPLBACKEND=Agg python -m uchrom.auto_discovery run data.h5cd runs/pantheon \
  --model openai/gpt-5.5 \
  --idea-agent-count 3 \
  --notebook-agent-concurrency 3 \
  --max-ideas 4

The PantheonOS backend creates two agent stages. Idea agents receive only file access to the serialized h5cd discovery schema. Accepted ideas are converted into scaffold notebooks, then notebook agents receive file access plus live notebook tools. They can read the notebook, insert Markdown notes, insert or update code cells, execute cells, inspect outputs, and refine the analysis inside the notebook. The runner then re-executes the final notebook from top to bottom and verifies the outputs for reproducibility.

Notebook agents are instructed to produce a quantitative matplotlib figure for each verified idea. The scaffold and lightweight runner executor force matplotlib to the non-interactive Agg backend so batch/doc runs do not open local GUI windows. For large batches, keep graphical abstract generation decoupled from notebook verification. Generated schematics should be added as a separate post-processing pass so image-generation latency cannot block statistical exploration. Use retries plus strict coverage checking before publishing exported notebooks:

python -m uchrom.auto_discovery schematics runs/pantheon \
  --model openai \
  --concurrency 3 \
  --retries 2 \
  --strict

The command writes pantheon/schematic_images/schematics.jsonl and pantheon/schematic_images/schematic_coverage.json. In strict mode, the command fails if any target notebook still lacks the schematic_image cell, so missing graphical abstracts cannot silently enter the docs snapshot. --visual-qa is available for runtimes that provide a Pantheon sub-agent callback context; the default CLI path keeps QA separate from the coverage gate.

For direct backend debugging, the runner can use any registered structured backend. Claude and Codex CLI backends return structured idea/code payloads; the U-Chrom runner still writes, executes, and verifies the final notebooks:

The shared prompt contracts live in uchrom.auto_discovery.llm; backend adapters call those prompt builders but do not own provider-specific prompt rules or the notebook lifecycle.

python -m uchrom.auto_discovery run data.h5cd runs/claude \
  --idea-source claude \
  --code-source claude \
  --reasoning-effort low \
  --max-ideas 4

Use --idea-source codex --code-source codex to exercise the Codex CLI backend with the same runner-owned notebook lifecycle.

Each run writes:

ideas.jsonl
reviews.jsonl
results.jsonl
agent_records.jsonl
report.md
graph/idea_graph.json
graph/graph_summary.md
one notebook per accepted idea under notebooks/

When --store-schema is used, the runner also stores a lightweight graph pointer and run summary in cdata.uns["auto_discovery_graph"] and cdata.uns["auto_discovery_runs"]. Full notebooks and graph artifacts stay in the run directory.

Existing runs can be converted into graph artifacts without re-running the agents:

python -m uchrom.auto_discovery graph runs/pantheon

Verified Notebook Exports¶

The Takei 2025 tutorial build includes standalone HTML exports from a completed 20-idea Pantheon notebook-agent batch. The batch generated 20 ideas, accepted 20 after h5cd schema review, executed 20 notebooks, and U-Chrom re-executed/verified all 20 with explicit hypothesis tests and saved statistical figures.

The exported notebooks intentionally separate quantitative verification from schematic generation. Main exploration notebooks include statistical matplotlib figures; generated graphical abstracts are added by a separate post-processing pass with strict coverage checking before the docs snapshot is published.

Purkinje-specific H3K27ac decompaction along traced chromosomes: one-sided label permutation test on trace/chromosome deltas, 500 permutations…, p = 1.
Bergmann-specific LaminB1 peripheral anchoring signature: one-sided exact cell-label permutation test, p = 0.9881.
Granule-cell HP1alpha heterochromatin clustering in 3D traces: one-sided matched randomization test (500 permutations; grouped by Granule tr…, p = 0.001996.
Pcp2-linked active chromatin hub proximity across cell types: one-sided Spearman permutation test (1000 label shuffles), p = 0.999.
Pcp2 expression predicts H3K27ac-marked chromatin spatial clustering: Spearman correlation, one-sided negative; 1000 deterministic label-shuffle pe…, p = 0.9989.
Aldoc expression tracks lamina-associated chromatin signal: Spearman correlation with fixed-seed one-sided permutation test (1000 Aldoc-l…, p = 0.07093.
Gabra6 expression links to elongating RNA polymerase chromatin signal: one-sided Spearman rank correlation with fixed-seed label permutation control, p = 0.783.
Reln expression predicts spatial coupling of H3K4me1 and CBP chromatin spots: Spearman correlation with 1000 fixed-seed Reln-label permutations (one-sided…, p = 0.4156.
Lamina-proximal local compaction across chromosome traces: Spearman rank correlation; one-sided permutation test with 500 shuffles of n_…, p = 0.001996.
Radial enrichment of active H3K27ac chromatin: Within-cell H3K27ac-rank permutation test (500 permutations, two-sided), p = 0.001996.
Purkinje marker expression predicts chromosome-wide radial positioning: Spearman rank correlation with 1000 reproducible cell-label permutations, p = 0.001998.
Lamina association of repetitive satellite-rich chromatin: within-cell satellite-score permutation test (500 permutations, one-sided neg…, p = 1.
Xist-marked chrX inter-chromosomal isolation: within-cell Xist-label randomization test, one-sided greater, 500 permutations, p = 0.01198.
Active chromatin assortativity between chromosomes: one-sided chromosome-label permutation test (500 permutations), p = 0.001996.
rDNA-marked inter-chromosomal hub compaction: one-sided within-cell rDNA-label permutation test (300 permutations) on mean…, p = 0.003322.
Chromosome-specific peripheral positioning by LaminB1: within-cell chromosome-label permutation test (500 permutations, two-sided me…, p = 0.005988.
H3K27ac-high loci resist spurious compaction calls after marker permutation: within-trace H3K27ac marker permutation test (500 permutations); supplementar…, p = 0.8743.
RNAPIISer2-P neighborhoods around polyA_RNA spots should survive cell-label negative controls: one-sided cell-label permutation test (500 permutations), p = 0.001996.
H3K9me3 radial enrichment should be stronger than shuffled radial assignments: within-cell n_rad_score permutation test (500 permutations, one-sided), p = 1.
Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls: seeded one-sided shuffled-cell permutation test of Spearman rho (n_permutatio…, p = 0.004995.

Notebook-first Code Agent¶

The code agent is not limited to a fixed tool registry. It can generate free-form Python, but the exploration is auditable because every accepted idea gets a notebook with:

a readable Markdown idea brief generated by the idea agent
data/schema checks
agent-authored Markdown notes and executable analysis cells
an explicit hypothesis test with null/alternative hypotheses, test method, p-value, effect size, and sample-size/context notes
a statistical matplotlib figure explaining the observed parameter
an optional post-processed scientific schematic / graphical abstract
result table output
verification summary

Successful notebook code can later be promoted into stable modules such as uchrom.fea or uchrom.strc.

PantheonOS¶

uchrom.auto_discovery.pantheon implements the default agent orchestration used by run_auto_discovery. PantheonOS is kept as an optional runtime dependency of U-Chrom itself, but the Pantheon backend requires pantheon-agents to be installed in the environment that runs the agents.

The backend is model-provider agnostic at the Pantheon layer. The example above uses openai/gpt-5.5, but any Pantheon-supported model string can be passed with --model.