Multi-omics Auto-discovery¶
U-Chrom can expose an agent-readable schema directly from a .h5cd file.
The schema is stored in cdata.uns["auto_discovery_schema"], so discovery
agents can reason from the same object that downstream analyses load.
Schema¶
from uchrom import ChromData
cdata = ChromData.read("takei2025_cerebellum_rep1.h5cd")
schema = cdata.build_discovery_schema(store=True)
print(cdata.describe_for_agent())
cdata.write("takei2025_cerebellum_rep1.h5cd")
The schema records:
axes: spot, trace, cell, marker, gene
modalities: chromatin tracing, IF tracks, cell metadata, linked RNA
field catalogs:
spots,tracks,cells,cellm,linked_adataknown missing data, such as raw RNA spot geometry or scRNA reference
recommended verification checks for generated parameters
Runner¶
The local runner performs:
h5cd schema -> idea generation -> schema review -> notebook creation
-> notebook execution -> verification -> report
Run the default PantheonOS-agent backend:
MPLBACKEND=Agg python -m uchrom.auto_discovery run data.h5cd runs/pantheon \
--model openai/gpt-5.5 \
--idea-agent-count 3 \
--notebook-agent-concurrency 3 \
--max-ideas 4
The PantheonOS backend creates two agent stages. Idea agents receive only file access to the serialized h5cd discovery schema. Accepted ideas are converted into scaffold notebooks, then notebook agents receive file access plus live notebook tools. They can read the notebook, insert Markdown notes, insert or update code cells, execute cells, inspect outputs, and refine the analysis inside the notebook. The runner then re-executes the final notebook from top to bottom and verifies the outputs for reproducibility.
Notebook agents are instructed to produce a quantitative matplotlib figure
for each verified idea. The scaffold and lightweight runner executor force
matplotlib to the non-interactive Agg backend so batch/doc runs do not open
local GUI windows. For large batches, keep graphical abstract generation
decoupled from notebook verification; generated schematics should be added as
a separate low-concurrency post-processing pass so image-generation latency
cannot block statistical exploration.
For direct backend debugging, the lower-level OpenAI path remains available:
python -m uchrom.auto_discovery run data.h5cd runs/openai \
--idea-source openai \
--code-source openai \
--model gpt-5.5 \
--reasoning-effort medium \
--max-ideas 4
Each run writes:
ideas.jsonlreviews.jsonlresults.jsonlagent_records.jsonlreport.mdone notebook per accepted idea under
notebooks/
Verified Notebook Exports¶
The Takei 2025 tutorial build includes standalone HTML exports from a completed 20-idea Pantheon notebook-agent batch. The batch generated 20 ideas, accepted 20 after h5cd schema review, executed 20 notebooks, and U-Chrom re-executed/verified all 20 with explicit hypothesis tests and saved statistical figures.
The exported notebooks intentionally separate quantitative verification from optional schematic generation. Main exploration notebooks include statistical matplotlib figures; generated graphical abstracts should be added by a separate low-concurrency post-processing pass when needed.
Purkinje-specific H3K27ac decompaction along traced chromosomes: one-sided label permutation test on trace/chromosome deltas, 500 permutations…, p = 1.
Bergmann-specific LaminB1 peripheral anchoring signature: one-sided exact cell-label permutation test, p = 0.9881.
Granule-cell HP1alpha heterochromatin clustering in 3D traces: one-sided matched randomization test (500 permutations; grouped by Granule tr…, p = 0.001996.
Pcp2-linked active chromatin hub proximity across cell types: one-sided Spearman permutation test (1000 label shuffles), p = 0.999.
Pcp2 expression predicts H3K27ac-marked chromatin spatial clustering: Spearman correlation, one-sided negative; 1000 deterministic label-shuffle pe…, p = 0.9989.
Aldoc expression tracks lamina-associated chromatin signal: Spearman correlation with fixed-seed one-sided permutation test (1000 Aldoc-l…, p = 0.07093.
Gabra6 expression links to elongating RNA polymerase chromatin signal: one-sided Spearman rank correlation with fixed-seed label permutation control, p = 0.783.
Reln expression predicts spatial coupling of H3K4me1 and CBP chromatin spots: Spearman correlation with 1000 fixed-seed Reln-label permutations (one-sided…, p = 0.4156.
Lamina-proximal local compaction across chromosome traces: Spearman rank correlation; one-sided permutation test with 500 shuffles of n_…, p = 0.001996.
Radial enrichment of active H3K27ac chromatin: Within-cell H3K27ac-rank permutation test (500 permutations, two-sided), p = 0.001996.
Purkinje marker expression predicts chromosome-wide radial positioning: Spearman rank correlation with 1000 reproducible cell-label permutations, p = 0.001998.
Lamina association of repetitive satellite-rich chromatin: within-cell satellite-score permutation test (500 permutations, one-sided neg…, p = 1.
Xist-marked chrX inter-chromosomal isolation: within-cell Xist-label randomization test, one-sided greater, 500 permutations, p = 0.01198.
Active chromatin assortativity between chromosomes: one-sided chromosome-label permutation test (500 permutations), p = 0.001996.
rDNA-marked inter-chromosomal hub compaction: one-sided within-cell rDNA-label permutation test (300 permutations) on mean…, p = 0.003322.
Chromosome-specific peripheral positioning by LaminB1: within-cell chromosome-label permutation test (500 permutations, two-sided me…, p = 0.005988.
H3K27ac-high loci resist spurious compaction calls after marker permutation: within-trace H3K27ac marker permutation test (500 permutations); supplementar…, p = 0.8743.
RNAPIISer2-P neighborhoods around polyA_RNA spots should survive cell-label negative controls: one-sided cell-label permutation test (500 permutations), p = 0.001996.
H3K9me3 radial enrichment should be stronger than shuffled radial assignments: within-cell n_rad_score permutation test (500 permutations, one-sided), p = 1.
Pcp2 expression should align with per-cell H3K27ac only beyond shuffled-cell controls: seeded one-sided shuffled-cell permutation test of Spearman rho (n_permutatio…, p = 0.004995.
Notebook-first Code Agent¶
The code agent is not limited to a fixed tool registry. It can generate free-form Python, but the exploration is auditable because every accepted idea gets a notebook with:
a readable Markdown idea brief generated by the idea agent
data/schema checks
agent-authored Markdown notes and executable analysis cells
an explicit hypothesis test with null/alternative hypotheses, test method, p-value, effect size, and sample-size/context notes
a statistical matplotlib figure explaining the observed parameter
an optional post-processed scientific schematic / graphical abstract
result table output
verification summary
Successful notebook code can later be promoted into stable modules such as
uchrom.fea or uchrom.strc.
PantheonOS¶
uchrom.auto_discovery.pantheon implements the default agent orchestration
used by run_auto_discovery. PantheonOS is kept as an optional runtime
dependency of U-Chrom itself, but the Pantheon backend requires
pantheon-agents to be installed in the environment that runs the agents.
The backend is model-provider agnostic at the Pantheon layer. The example
above uses openai/gpt-5.5, but any Pantheon-supported model string can be
passed with --model.