Multi-omics Auto-discovery

U-Chrom can expose an agent-readable schema directly from a .h5cd file. The schema is stored in cdata.uns["auto_discovery_schema"], so discovery agents can reason from the same object that downstream analyses load.

Schema

from uchrom import ChromData

cdata = ChromData.read("takei2025_cerebellum_rep1.h5cd")
schema = cdata.build_discovery_schema(store=True)
print(cdata.describe_for_agent())
cdata.write("takei2025_cerebellum_rep1.h5cd")

The schema records:

  • axes: spot, trace, cell, marker, gene

  • modalities: chromatin tracing, IF tracks, cell metadata, linked RNA

  • field catalogs: spots, tracks, cells, cellm, linked_adata

  • known missing data, such as raw RNA spot geometry or scRNA reference

  • recommended verification checks for generated parameters

Runner

The local runner performs:

h5cd schema -> idea generation -> schema review -> notebook creation
-> notebook execution -> verification -> report

Run the default PantheonOS-agent backend:

MPLBACKEND=Agg python -m uchrom.auto_discovery run data.h5cd runs/pantheon \
  --model openai/gpt-5.5 \
  --idea-agent-count 3 \
  --notebook-agent-concurrency 3 \
  --max-ideas 4

The PantheonOS backend creates two agent stages. Idea agents receive only file access to the serialized h5cd discovery schema. Accepted ideas are converted into scaffold notebooks, then notebook agents receive file access plus live notebook tools. They can read the notebook, insert Markdown notes, insert or update code cells, execute cells, inspect outputs, and refine the analysis inside the notebook. The runner then re-executes the final notebook from top to bottom and verifies the outputs for reproducibility.

Notebook agents are instructed to produce a quantitative matplotlib figure for each verified idea. The scaffold and lightweight runner executor force matplotlib to the non-interactive Agg backend so batch/doc runs do not open local GUI windows. For large batches, keep graphical abstract generation decoupled from notebook verification; generated schematics should be added as a separate low-concurrency post-processing pass so image-generation latency cannot block statistical exploration.

For direct backend debugging, the lower-level OpenAI path remains available:

python -m uchrom.auto_discovery run data.h5cd runs/openai \
  --idea-source openai \
  --code-source openai \
  --model gpt-5.5 \
  --reasoning-effort medium \
  --max-ideas 4

Each run writes:

  • ideas.jsonl

  • reviews.jsonl

  • results.jsonl

  • agent_records.jsonl

  • report.md

  • one notebook per accepted idea under notebooks/

Verified Notebook Exports

The Takei 2025 tutorial build includes standalone HTML exports from a completed 20-idea Pantheon notebook-agent batch. The batch generated 20 ideas, accepted 20 after h5cd schema review, executed 20 notebooks, and U-Chrom re-executed/verified all 20 with explicit hypothesis tests and saved statistical figures.

The exported notebooks intentionally separate quantitative verification from optional schematic generation. Main exploration notebooks include statistical matplotlib figures; generated graphical abstracts should be added by a separate low-concurrency post-processing pass when needed.

Notebook-first Code Agent

The code agent is not limited to a fixed tool registry. It can generate free-form Python, but the exploration is auditable because every accepted idea gets a notebook with:

  • a readable Markdown idea brief generated by the idea agent

  • data/schema checks

  • agent-authored Markdown notes and executable analysis cells

  • an explicit hypothesis test with null/alternative hypotheses, test method, p-value, effect size, and sample-size/context notes

  • a statistical matplotlib figure explaining the observed parameter

  • an optional post-processed scientific schematic / graphical abstract

  • result table output

  • verification summary

Successful notebook code can later be promoted into stable modules such as uchrom.fea or uchrom.strc.

PantheonOS

uchrom.auto_discovery.pantheon implements the default agent orchestration used by run_auto_discovery. PantheonOS is kept as an optional runtime dependency of U-Chrom itself, but the Pantheon backend requires pantheon-agents to be installed in the environment that runs the agents.

The backend is model-provider agnostic at the Pantheon layer. The example above uses openai/gpt-5.5, but any Pantheon-supported model string can be passed with --model.