uchrom.auto_discovery¶
Multi-omics auto-discovery helpers for ChromData.
- class uchrom.auto_discovery.DiscoveryIdea(idea_title: str, biological_hypothesis: str, computable_parameter: str, analysis_plan: str, modalities: list[str], idea_markdown: str = '', cell_types: list[str] = <factory>, required_fields: list[str] = <factory>, validation_checks: list[str] = <factory>, expected_direction: str = '', complexity: int = 3, idea_id: str = '', metadata: dict[str, ~typing.Any] = <factory>)[source]¶
Bases:
objectA computable multi-omics discovery idea.
- class uchrom.auto_discovery.DiscoveryRunConfig(h5cd_path: str | Path, output_dir: str | Path, max_ideas: int = 12, ideas_path: str | Path | None = None, max_complexity: int = 5, idea_source: str = 'pantheon', code_source: str = 'pantheon', model: str | None = None, reasoning_effort: str | None = None, llm_timeout: int = 420, idea_agent_count: int = 3, notebook_agent_concurrency: int = 3, generate_schematic_image: bool = False, schematic_image_model: str | None = None, schematic_image_model_args: dict[str, Any] | None = None, execute: bool = True, stop_on_error: bool = False, store_schema: bool = False, dataset_name: str | None = None)[source]¶
Bases:
objectConfiguration for a local auto-discovery run.
- class uchrom.auto_discovery.DiscoveryRunResult(output_dir: str, n_generated: int, n_accepted: int, n_executed: int, n_verified: int, report_path: str, ideas_path: str, reviews_path: str, results_path: str, notebooks: list[str] = <factory>, agent_records_path: str | None = None)[source]¶
Bases:
objectSummary of an auto-discovery run.
- class uchrom.auto_discovery.EvidenceConclusion(notebook_status: str, hypothesis_status: str, direction_status: str, p_value: float | None, effect_size: float | None, summary: str)[source]¶
Bases:
objectHuman-readable evidence classification for one explored idea.
- class uchrom.auto_discovery.IdeaReview(accepted: bool, errors: list[str] = <factory>, warnings: list[str] = <factory>, missing_fields: list[str] = <factory>)[source]¶
Bases:
objectResult of reviewing an idea against a discovery schema.
- uchrom.auto_discovery.build_discovery_schema(cdata, *, dataset_name: str | None = None, include_linked_adata: bool = True, max_catalog_items: int = 500) dict[str, Any][source]¶
Build an agent-readable schema from a
ChromDataobject.The returned dict is JSON-serializable and can be persisted via
pack_schema()undercd.uns['auto_discovery_schema'].
- uchrom.auto_discovery.classify_hypothesis_evidence(idea: DiscoveryIdea | Mapping[str, Any], verification: Mapping[str, Any] | None, *, alpha: float = 0.05) EvidenceConclusion[source]¶
Classify whether a verified notebook supports the biological hypothesis.
verification.status == "pass"means U-Chrom could rerun the notebook and validate required fields, finite outputs, and an explicit statistical test. This function separately classifies the hypothesis evidence from p-value and effect direction so “verified” is not confused with “biologically true”.
- uchrom.auto_discovery.create_exploration_notebook(idea: DiscoveryIdea | Mapping[str, Any], output_path: str | Path, *, h5cd_path: str | Path | None = None, run_output_dir: str | Path | None = None, analysis_code: str | None = None, verification_code: str | None = None, kernel_name: str = 'python3') Path[source]¶
Create a standard exploration notebook for one idea.
The code agent is expected to edit and execute this notebook freely. The scaffold only defines the audit trail and verification contract.
- uchrom.auto_discovery.execute_notebook_python(notebook_path: str | Path, *, stop_on_error: bool = False) dict[str, Any][source]¶
Execute Python code cells in a notebook JSON file.
This lightweight executor is intentionally small: it keeps a shared Python namespace across code cells, captures stdout/stderr/error text, writes outputs back into the notebook, and returns the final namespace entries commonly used by the auto-discovery runner. It is meant for deterministic smoke tests; Pantheon/Jupyter can still execute the same notebooks in richer interactive runs.
- uchrom.auto_discovery.generate_openai_analysis_code(idea: DiscoveryIdea, schema: Mapping[str, Any], *, model: str | None = None, reasoning_effort: str | None = None, api_key: str | None = None, env_path: str | Path = '~/.env', timeout: int = 180) str[source]¶
Generate free-form notebook analysis code for one idea.
- uchrom.auto_discovery.generate_openai_ideas(schema: Mapping[str, Any], *, max_ideas: int = 8, model: str | None = None, reasoning_effort: str | None = None, api_key: str | None = None, env_path: str | Path = '~/.env', timeout: int = 120) list[DiscoveryIdea][source]¶
Generate structured ideas with the OpenAI Responses API.
This function intentionally avoids a hard dependency on the OpenAI Python SDK so the package remains lightweight. It uses the API key from
api_key,OPENAI_API_KEY, or~/.env.
- async uchrom.auto_discovery.generate_pantheon_ideas(schema: Mapping[str, Any], *, output_dir: str | Path, max_ideas: int, model: str | None = None, timeout: int = 420, idea_agent_count: int = 3) tuple[list[DiscoveryIdea], list[PantheonAgentRecord]][source]¶
Generate ideas by running multiple Pantheon idea agents in parallel.
Each idea agent receives only a file-access toolset. The schema and prompt are written to disk, and agents are instructed to read those files before returning DiscoveryIdea-compatible JSON.
- uchrom.auto_discovery.idea_to_markdown(idea: DiscoveryIdea | Mapping[str, Any]) str[source]¶
Return a readable Markdown brief for a discovery idea.
- uchrom.auto_discovery.pack_schema(schema: Mapping[str, Any]) dict[str, str][source]¶
Pack a schema as an HDF5-friendly
unsentry.
- uchrom.auto_discovery.review_idea_against_schema(idea: DiscoveryIdea | Mapping[str, Any], schema: Mapping[str, Any], *, min_complexity: int = 1, max_complexity: int = 5) IdeaReview[source]¶
Check whether an idea is compatible with a discovery schema.
- uchrom.auto_discovery.run_auto_discovery(config: DiscoveryRunConfig | dict[str, Any]) DiscoveryRunResult[source]¶
Run schema → idea → notebook → verification for one
.h5cd.
- async uchrom.auto_discovery.run_pantheon_notebook_agents(ideas: Sequence[DiscoveryIdea], *, schema: Mapping[str, Any], h5cd_path: str | Path, output_dir: str | Path, notebooks_dir: str | Path, model: str | None = None, timeout: int = 420, concurrency: int = 3, generate_schematic_image: bool = False, schematic_image_model: str | None = None, schematic_image_model_args: Mapping[str, Any] | None = None) list[PantheonAgentRecord][source]¶
Run notebook exploration agents in parallel for accepted ideas.
The caller is responsible for creating scaffold notebooks first. Each Pantheon notebook agent receives file-access and notebook toolsets, edits its assigned notebook, executes exploration cells, and returns a JSON status summary.
- uchrom.auto_discovery.schema_to_agent_context(schema: Mapping[str, Any], *, max_items: int = 40) str[source]¶
Render a compact, prompt-ready schema summary.
- uchrom.auto_discovery.structured_conclusion_markdown(idea: DiscoveryIdea | Mapping[str, Any], verification: Mapping[str, Any] | None) str[source]¶
Build the standard final interpretation Markdown for a notebook.
- uchrom.auto_discovery.unpack_schema(raw: Any) dict[str, Any][source]¶
Unpack a schema from
cd.uns['auto_discovery_schema'].
- uchrom.auto_discovery.upsert_structured_conclusion(notebook_path: str | Path, idea: DiscoveryIdea | Mapping[str, Any], verification: Mapping[str, Any] | None) Path[source]¶
Insert or replace the notebook’s final interpretation with standard text.
- uchrom.auto_discovery.validate_discovery_schema(schema: Mapping[str, Any], cdata=None) list[str][source]¶
Return validation issues for a discovery schema.
Schema¶
Agent-readable discovery schema for uchrom.core.ChromData.
The schema is intentionally stored inside cd.uns as a JSON payload so
it round-trips through .h5cd without requiring a new HDF5 layout.
- uchrom.auto_discovery.schema.build_discovery_schema(cdata, *, dataset_name: str | None = None, include_linked_adata: bool = True, max_catalog_items: int = 500) dict[str, Any][source]¶
Build an agent-readable schema from a
ChromDataobject.The returned dict is JSON-serializable and can be persisted via
pack_schema()undercd.uns['auto_discovery_schema'].
- uchrom.auto_discovery.schema.pack_schema(schema: Mapping[str, Any]) dict[str, str][source]¶
Pack a schema as an HDF5-friendly
unsentry.
- uchrom.auto_discovery.schema.schema_to_agent_context(schema: Mapping[str, Any], *, max_items: int = 40) str[source]¶
Render a compact, prompt-ready schema summary.
Ideas¶
Idea records and schema-based review for auto-discovery.
- class uchrom.auto_discovery.ideas.DiscoveryIdea(idea_title: str, biological_hypothesis: str, computable_parameter: str, analysis_plan: str, modalities: list[str], idea_markdown: str = '', cell_types: list[str] = <factory>, required_fields: list[str] = <factory>, validation_checks: list[str] = <factory>, expected_direction: str = '', complexity: int = 3, idea_id: str = '', metadata: dict[str, ~typing.Any] = <factory>)[source]¶
Bases:
objectA computable multi-omics discovery idea.
- class uchrom.auto_discovery.ideas.IdeaReview(accepted: bool, errors: list[str] = <factory>, warnings: list[str] = <factory>, missing_fields: list[str] = <factory>)[source]¶
Bases:
objectResult of reviewing an idea against a discovery schema.
LLM idea generation¶
Optional LLM-backed idea generation.
- uchrom.auto_discovery.llm.generate_openai_analysis_code(idea: DiscoveryIdea, schema: Mapping[str, Any], *, model: str | None = None, reasoning_effort: str | None = None, api_key: str | None = None, env_path: str | Path = '~/.env', timeout: int = 180) str[source]¶
Generate free-form notebook analysis code for one idea.
- uchrom.auto_discovery.llm.generate_openai_ideas(schema: Mapping[str, Any], *, max_ideas: int = 8, model: str | None = None, reasoning_effort: str | None = None, api_key: str | None = None, env_path: str | Path = '~/.env', timeout: int = 120) list[DiscoveryIdea][source]¶
Generate structured ideas with the OpenAI Responses API.
This function intentionally avoids a hard dependency on the OpenAI Python SDK so the package remains lightweight. It uses the API key from
api_key,OPENAI_API_KEY, or~/.env.
Notebooks¶
Notebook scaffolding for auto-discovery idea exploration.
- uchrom.auto_discovery.notebooks.create_exploration_notebook(idea: DiscoveryIdea | Mapping[str, Any], output_path: str | Path, *, h5cd_path: str | Path | None = None, run_output_dir: str | Path | None = None, analysis_code: str | None = None, verification_code: str | None = None, kernel_name: str = 'python3') Path[source]¶
Create a standard exploration notebook for one idea.
The code agent is expected to edit and execute this notebook freely. The scaffold only defines the audit trail and verification contract.
- uchrom.auto_discovery.notebooks.execute_notebook_python(notebook_path: str | Path, *, stop_on_error: bool = False) dict[str, Any][source]¶
Execute Python code cells in a notebook JSON file.
This lightweight executor is intentionally small: it keeps a shared Python namespace across code cells, captures stdout/stderr/error text, writes outputs back into the notebook, and returns the final namespace entries commonly used by the auto-discovery runner. It is meant for deterministic smoke tests; Pantheon/Jupyter can still execute the same notebooks in richer interactive runs.
Runner¶
Runnable auto-discovery pipeline for ChromData.
- class uchrom.auto_discovery.runner.DiscoveryRunConfig(h5cd_path: str | Path, output_dir: str | Path, max_ideas: int = 12, ideas_path: str | Path | None = None, max_complexity: int = 5, idea_source: str = 'pantheon', code_source: str = 'pantheon', model: str | None = None, reasoning_effort: str | None = None, llm_timeout: int = 420, idea_agent_count: int = 3, notebook_agent_concurrency: int = 3, generate_schematic_image: bool = False, schematic_image_model: str | None = None, schematic_image_model_args: dict[str, Any] | None = None, execute: bool = True, stop_on_error: bool = False, store_schema: bool = False, dataset_name: str | None = None)[source]¶
Bases:
objectConfiguration for a local auto-discovery run.
- class uchrom.auto_discovery.runner.DiscoveryRunResult(output_dir: str, n_generated: int, n_accepted: int, n_executed: int, n_verified: int, report_path: str, ideas_path: str, reviews_path: str, results_path: str, notebooks: list[str] = <factory>, agent_records_path: str | None = None)[source]¶
Bases:
objectSummary of an auto-discovery run.
- uchrom.auto_discovery.runner.run_auto_discovery(config: DiscoveryRunConfig | dict[str, Any]) DiscoveryRunResult[source]¶
Run schema → idea → notebook → verification for one
.h5cd.
PantheonOS backend¶
PantheonOS-backed auto-discovery orchestration.
- class uchrom.auto_discovery.pantheon.PantheonAgentRecord(agent_name: str, role: str, prompt_path: str, content: Any)[source]¶
Bases:
objectAudit record for one Pantheon agent call.
- class uchrom.auto_discovery.pantheon.PantheonSchematicRecord(idea_id: str, notebook: str, prompt_path: str, image_path: str | None, status: str, model: str | None = None, visual_qa: Any = None, error: str | None = None)[source]¶
Bases:
objectAudit record for one post-hoc Pantheon schematic generation.
- async uchrom.auto_discovery.pantheon.generate_pantheon_ideas(schema: Mapping[str, Any], *, output_dir: str | Path, max_ideas: int, model: str | None = None, timeout: int = 420, idea_agent_count: int = 3) tuple[list[DiscoveryIdea], list[PantheonAgentRecord]][source]¶
Generate ideas by running multiple Pantheon idea agents in parallel.
Each idea agent receives only a file-access toolset. The schema and prompt are written to disk, and agents are instructed to read those files before returning DiscoveryIdea-compatible JSON.
- async uchrom.auto_discovery.pantheon.generate_pantheon_schematic_images_for_run(run_dir: str | Path, *, model: str | None = None, model_args: Mapping[str, Any] | None = None, timeout: int = 420, concurrency: int = 1, max_images: int | None = None, verified_only: bool = True, force: bool = False, visual_qa: bool = False) list[PantheonSchematicRecord][source]¶
Generate and insert graphical abstracts for a completed discovery run.
This is intentionally separate from notebook execution. Image generation is slow and provider-dependent, so completed/verified notebooks can be exported quickly first and then decorated with generated schematics in a bounded pass.
- async uchrom.auto_discovery.pantheon.run_pantheon_notebook_agents(ideas: Sequence[DiscoveryIdea], *, schema: Mapping[str, Any], h5cd_path: str | Path, output_dir: str | Path, notebooks_dir: str | Path, model: str | None = None, timeout: int = 420, concurrency: int = 3, generate_schematic_image: bool = False, schematic_image_model: str | None = None, schematic_image_model_args: Mapping[str, Any] | None = None) list[PantheonAgentRecord][source]¶
Run notebook exploration agents in parallel for accepted ideas.
The caller is responsible for creating scaffold notebooks first. Each Pantheon notebook agent receives file-access and notebook toolsets, edits its assigned notebook, executes exploration cells, and returns a JSON status summary.