Takei 2025 iterative auto-discovery

This notebook shows the current iterative auto-discovery loop on the real linked Takei cerebellum data. It reads the .h5cd / .h5ad pair, combines a prior 20-idea Pantheon notebook-agent batch with a second graph-aware 20-idea iteration, classifies the evidence, and plans follow-up directions for the next round.

What is implemented here:

  • h5cd-backed discovery schema, including linked AnnData context.

  • Idea graph artifacts from generated ideas, reviews, notebook runs, evidence results, and dataset facets across multiple run directories.

  • Graph-aware follow-up direction planning that uses supported, contradicted, not-supported, and uncovered directions to seed the next round.

  • A static interactive HTML viewer for ideas, evidence, notebook links, and browser-agent literature-claim nodes.

The second round in this artifact generated 20 new ideas; the h5cd schema/review gate accepted 13, and all 13 accepted ideas were executed and verified by Pantheon notebook agents. Graphical abstracts were added by the post-hoc Pantheon file-tool schematic pass with strict coverage checking: 13/13 second-round notebooks and 33/33 combined exported notebooks contain a schematic_image cell. A browser-literature pass also contributed four source-grounded claims from the Takei 2025 paper record.

from pathlib import Path
import json
import os
os.environ.setdefault('MPLBACKEND', 'Agg')

import matplotlib
matplotlib.use('Agg', force=True)
import matplotlib.pyplot as plt
import pandas as pd

from uchrom.io import load_takei2025_cerebellum
from uchrom.auto_discovery import (
    build_idea_graph_from_runs,
    classify_hypothesis_evidence,
    directions_to_agent_context,
    ingest_literature_claims,
    plan_discovery_directions,
    write_direction_artifacts,
    write_idea_graph_artifacts,
)
from uchrom.auto_discovery.ideas import DiscoveryIdea


def repo_root() -> Path:
    for p in [Path.cwd(), *Path.cwd().parents]:
        if (p / 'pyproject.toml').exists() and (p / 'uchrom').exists():
            return p
    return Path.cwd()


ROOT = repo_root()
TAKEI_DIR = ROOT / 'example-data' / 'takei2025_cerebellum'
SEED_RUN_DIR = ROOT / 'tmp' / 'takei_auto_discovery_doc' / 'run_pantheon_20_ideas_verified_agg'
ITERATIVE_DIR = ROOT / 'tmp' / 'takei_auto_discovery_doc' / 'iterative_2x20_real'
SECOND_RUN_DIR = ITERATIVE_DIR / 'iteration_01'
CLAIMS_PATH = ROOT / 'docs' / 'source' / '_extra' / '_auto_discovery_claims' / 'takei2025_browser_claims.jsonl'
RUN_DIRS = [SEED_RUN_DIR]
if SECOND_RUN_DIR.exists():
    RUN_DIRS.append(SECOND_RUN_DIR)
GRAPH_OUTPUT_DIR = ITERATIVE_DIR if SECOND_RUN_DIR.exists() else SEED_RUN_DIR

print(f'ROOT = {ROOT}')
print(f'TAKEI_DIR exists = {TAKEI_DIR.exists()}')
print(f'CLAIMS_PATH exists = {CLAIMS_PATH.exists()}')
print('run dirs:')
for run_dir in RUN_DIRS:
    print(f'  {run_dir} exists={run_dir.exists()}')
ROOT = /Users/weizexu/Projects/U-Chrom
TAKEI_DIR exists = True
CLAIMS_PATH exists = True
run dirs:
  /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/run_pantheon_20_ideas_verified_agg exists=True
  /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/iterative_2x20_real/iteration_01 exists=True

1. Load the linked real dataset

The loader returns cdata; RNA information is exposed through cdata.linked_adata. The schema used by agents is built from this real h5cd/h5ad pair.

cdata = load_takei2025_cerebellum(
    replicate=1,
    data_dir=TAKEI_DIR,
    download=False,
)
adata = cdata.linked_adata
schema = cdata.build_discovery_schema(
    store=False,
    dataset_name='takei2025_cerebellum_rep1',
    max_catalog_items=80,
)

print(cdata)
print(f'linked_adata shape: {None if adata is None else adata.shape}')
print('schema modalities present:', [k for k, v in schema['modalities'].items() if v.get('present')])
print('cell types:', schema['catalogs']['cell_types']['counts'])
print('first genes:', schema['catalogs']['genes']['values'][:12])
print('first tracks:', schema['catalogs']['tracks']['values'][:12])
ChromData: n_spots=10912638, n_traces=59112, n_cells=1799
  spots:   ['chrom', 'start', 'end', 'trace_id', 'cell_id', 'name']
  cells:   ['leiden', 'cell_type', 'x_centroid', 'y_centroid', 'z_centroid', 'nuc_volume_um3', 'doublet', 'batch', 'n_transcripts', 'n_genes_by_counts'] (1799 cells)
  cellm:   {'umap': (1799, 2)}
  tracks:  ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1', 'RNAPIISer5-P', 'RYBP', 'HP1beta', 'RING1B', 'H2A.X', 'H3K4me1', 'H4K20me2', 'H3K27me2', 'JARID2', 'SF3A66', 'CBP', 'H2AK119u1', 'EZH2', 'H3K4me2', 'BRG1', 'HP1alpha', 'Fibrillarin', 'KAP1', 'H3K27ac', 'H3K4me3', 'H3K36ac', 'H3K14ac', 'H4K20me1', 'HP1gamma', 'H4K20me3', 'H3K27me3', 'mH2A1', 'CHD4', 'KAT3B_p300', 'H3K56ac', 'H3K36me3', 'HDAC1', 'SUZ12', 'H4K16ac', 'BRD4', 'SOX2', 'rDNA', 'MajSat', 'LINE1', 'SINEB1', 'Telomere', 'MinSat', 'Xist_RNA', 'ITS1_RNA', 'Rnu2_RNA', 'polyA_RNA', 'Malat1_RNA', 'dot_int', 'n_rad_score', 'n_per_dist(um)']
  traces:  ['dbscan_allele', 'dbscan_ldp_allele'] (59112 traces)
  uns:     ['allele_col', 'genome_assembly', 'keep_unclustered', 'source', 'voxel_xy_nm', 'voxel_z_nm', 'xyz_unit', 'zenodo_record', 'leiden_to_cell_type', 'linked_anndata']
  linked_adata: (1799, 60)
linked_adata shape: (1799, 60)
schema modalities present: ['chromatin_tracing', 'if_tracks', 'cell_metadata', 'rna_expression']
cell types: {'Granule': 1109, 'Other': 323, 'Bergmann': 192, 'MLI1': 90, 'Purkinje': 58, 'MLI2+PLI': 27}
first genes: ['Aldoc', 'Calb1', 'Cdh22', 'Drd3', 'Eomes', 'Ephb2', 'Foxj1', 'Gabra6', 'Gpr176', 'Grm1', 'Hspb1', 'Mrc1']
first tracks: ['CPSF6', 'ATRX', 'H4K8ac', 'HDAC2', 'H3K9ac', 'H3K9me3', 'H3K9me2', 'RNAPIISer2-P', 'H3', 'H3K36me2', 'UBTF', 'LaminB1']

2. Rebuild the idea graph from two completed iterations

This combines the original 20-idea Pantheon notebook-agent batch with a second graph-aware iteration. Each run contributes ideas.jsonl, reviews.jsonl, results.jsonl, and executed notebooks. The combined graph is written into the iterative run directory so later agents can consume a single prior graph.

required = ['ideas.jsonl', 'reviews.jsonl', 'results.jsonl']
for run_dir in RUN_DIRS:
    missing = [name for name in required if not (run_dir / name).exists()]
    if missing:
        raise FileNotFoundError(f'Missing completed run artifacts in {run_dir}: {missing}')

CLAIMS_PATHS = [RUN_ROOT / 'browser' / 'claims.jsonl'] if (RUN_ROOT / 'browser' / 'claims.jsonl').exists() else []
graph = build_idea_graph_from_runs(RUN_DIRS, graph_id='takei2025_iterative_2x20', claims_paths=CLAIMS_PATHS)
n_claims = ingest_literature_claims(graph, CLAIMS_PATH) if CLAIMS_PATH.exists() else 0
graph_paths = write_idea_graph_artifacts(graph, GRAPH_OUTPUT_DIR)
summary = graph.summary()

print('graph artifacts:')
for key, value in graph_paths.items():
    print(f'  {key}: {value}')
print('\nsummary:')
print(json.dumps({
    'n_nodes': summary['n_nodes'],
    'n_edges': summary['n_edges'],
    'hypothesis_status': summary['hypothesis_status'],
    'node_kinds': summary['node_kinds'],
    'browser_claims_ingested': n_claims,
}, indent=2))
graph artifacts:
  graph_json: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/iterative_2x20_real/graph/idea_graph.json
  graph_summary: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/iterative_2x20_real/graph/graph_summary.md

summary:
{
  "n_nodes": 195,
  "n_edges": 698,
  "hypothesis_status": {
    "Borderline": 1,
    "Contradicted": 6,
    "Not supported": 14,
    "Supported": 12
  },
  "node_kinds": {
    "CellType": 3,
    "EvidenceResult": 33,
    "Gene": 7,
    "IFMarker": 23,
    "Idea": 40,
    "IdeaReview": 40,
    "LiteratureClaim": 4,
    "Modality": 4,
    "NotebookRun": 33,
    "ParameterFamily": 5,
    "Reference": 1,
    "Run": 2
  },
  "browser_claims_ingested": 4
}

Interactive idea graph viewer

Open the static HTML graph viewer here: Takei 2025 iterative 2x20 idea graph.

The viewer opens with a top-down Discovery tree rooted at cdata. Under the root are browser/literature claims, first-round ideas, direction-guided follow-up ideas, and handoff-gap nodes. Each idea expands to the notebook-agent conclusion, exported notebook, and any next-round ideas derived from that conclusion. This combined graph currently has 40 idea nodes, 33 evidence nodes, 11 follow-up direction nodes, and 4 browser/literature claim nodes extracted from the Takei 2025 paper record.

3. Evidence status is separate from notebook verification

Every accepted notebook in the combined graph was executed and verified by U-Chrom. The biological hypothesis can still be supported, contradicted, borderline, or not supported depending on the p-value and effect direction.

def read_jsonl(path):
    rows = []
    with open(path) as fh:
        for line in fh:
            if line.strip():
                rows.append(json.loads(line))
    return rows

ideas_by_id = {}
results = []
for run_dir in RUN_DIRS:
    ideas_by_id.update({row['idea_id']: DiscoveryIdea.from_dict(row) for row in read_jsonl(run_dir / 'ideas.jsonl')})
    for row in read_jsonl(run_dir / 'results.jsonl'):
        row = dict(row)
        row['run_dir'] = run_dir.name
        results.append(row)

evidence_rows = []
for result in results:
    idea = ideas_by_id[result['idea_id']]
    verification = result.get('verification') or {}
    conclusion = classify_hypothesis_evidence(idea, verification)
    evidence_rows.append({
        'iteration': result['run_dir'],
        'idea': idea.idea_title,
        'notebook_status': conclusion.notebook_status,
        'hypothesis_status': conclusion.hypothesis_status,
        'direction': conclusion.direction_status,
        'p_value': conclusion.p_value,
        'effect_size': conclusion.effect_size,
        'test_method': verification.get('test_method'),
    })

evidence_df = pd.DataFrame(evidence_rows)
status_order = {'Supported': 0, 'Borderline': 1, 'Contradicted': 2, 'Not supported': 3, 'Inconclusive': 4}
evidence_df['status_rank'] = evidence_df['hypothesis_status'].map(status_order).fillna(99)
evidence_df = evidence_df.sort_values(['status_rank', 'p_value']).drop(columns=['status_rank'])
print(evidence_df[['iteration', 'idea', 'hypothesis_status', 'direction', 'p_value', 'effect_size']].to_string(index=False, max_colwidth=52))
print('\nhypothesis status counts:')
print(evidence_df['hypothesis_status'].value_counts().to_string())
                         iteration                                                 idea hypothesis_status                direction  p_value  effect_size
run_pantheon_20_ideas_verified_agg Granule-cell HP1alpha heterochromatin clustering ...         Supported       Expected direction 0.001996     0.094115
run_pantheon_20_ideas_verified_agg Lamina-proximal local compaction across chromosom...         Supported       Expected direction 0.001996     0.023680
run_pantheon_20_ideas_verified_agg   Active chromatin assortativity between chromosomes         Supported       Expected direction 0.001996     0.165338
                      iteration_01 Chromosome-specific radial address strength acros...         Supported       Expected direction 0.001996     0.313709
run_pantheon_20_ideas_verified_agg         rDNA-marked inter-chromosomal hub compaction         Supported       Expected direction 0.003322    -0.206427
run_pantheon_20_ideas_verified_agg Pcp2 expression should align with per-cell H3K27a...         Supported       Expected direction 0.004995     0.858434
                      iteration_01 Purkinje-marker expression association with radia...         Supported Direction not classified 0.005994    -0.845196
                      iteration_01 Bergmann-cell lamina-associated telomere positioning         Supported       Expected direction 0.007984     0.137181
                      iteration_01 Cell-type dependence of global chromatin peripher...         Supported       Expected direction 0.010989     0.785185
run_pantheon_20_ideas_verified_agg         Xist-marked chrX inter-chromosomal isolation         Supported       Expected direction 0.011976     0.028334
                      iteration_01 LaminB1 intensity as a continuous radial-position...         Supported       Expected direction 0.031250     0.079299
                      iteration_01 Peripheral heterochromatin enrichment relative to...         Supported       Expected direction 0.033138     0.082089
run_pantheon_20_ideas_verified_agg Aldoc expression tracks lamina-associated chromat...        Borderline       Expected direction 0.070929     0.535570
run_pantheon_20_ideas_verified_agg        Radial enrichment of active H3K27ac chromatin      Contradicted       Opposite direction 0.001996    -0.524224
run_pantheon_20_ideas_verified_agg RNAPIISer2-P neighborhoods around polyA_RNA spots...      Contradicted       Opposite direction 0.001996     0.391011
                      iteration_01 Interchromosomal nucleolar convergence of rDNA- a...      Contradicted       Opposite direction 0.001996     0.492112
                      iteration_01 Active interchromosomal proximity among H3K27ac- ...      Contradicted       Opposite direction 0.001996     0.813810
run_pantheon_20_ideas_verified_agg Purkinje marker expression predicts chromosome-wi...      Contradicted       Opposite direction 0.001998    -0.861932
run_pantheon_20_ideas_verified_agg Chromosome-specific peripheral positioning by Lam...      Contradicted       Opposite direction 0.005988    -0.267949
                      iteration_01 Chromosome-specific radial asymmetry of chrX rela...     Not supported Direction not classified 0.134865     0.012325
                      iteration_01 Purkinje Pcp2 expression association with chrX in...     Not supported       Expected direction 0.154845     0.486506
run_pantheon_20_ideas_verified_agg Reln expression predicts spatial coupling of H3K4...     Not supported       Expected direction 0.415584    -0.101274
                      iteration_01 chrX insulation linked to Xist RNA and H3K27me3 e...     Not supported       Opposite direction 0.566866    -0.019586
                      iteration_01 Granule-cell perinuclear heterochromatin enrichme...     Not supported       Opposite direction 0.642857     0.013893
run_pantheon_20_ideas_verified_agg Gabra6 expression links to elongating RNA polymer...     Not supported       Opposite direction 0.782968    -0.299244
run_pantheon_20_ideas_verified_agg H3K27ac-high loci resist spurious compaction call...     Not supported       Opposite direction 0.874251    -0.014128
run_pantheon_20_ideas_verified_agg Bergmann-specific LaminB1 peripheral anchoring si...     Not supported       Opposite direction 0.988095     0.498362
run_pantheon_20_ideas_verified_agg Pcp2 expression predicts H3K27ac-marked chromatin...     Not supported       Opposite direction 0.998863     0.870301
run_pantheon_20_ideas_verified_agg Pcp2-linked active chromatin hub proximity across...     Not supported       Opposite direction 0.999001     0.878669
run_pantheon_20_ideas_verified_agg Purkinje-specific H3K27ac decompaction along trac...     Not supported       Opposite direction 1.000000    -0.076350
run_pantheon_20_ideas_verified_agg Lamina association of repetitive satellite-rich c...     Not supported       Opposite direction 1.000000     0.712108
run_pantheon_20_ideas_verified_agg H3K9me3 radial enrichment should be stronger than...     Not supported       Opposite direction 1.000000    -0.660783
                      iteration_01 Purkinje-specific active chromatin coalescence ar...     Not supported       Opposite direction 1.000000     0.051302

hypothesis status counts:
hypothesis_status
Not supported    14
Supported        12
Contradicted      6
Borderline        1

4. Coverage in the combined idea graph

Coverage is represented as graph nodes and edges, not as fixed buckets. The planner can see which cell types, genes, IF markers, modalities, and parameter families have already been used across both iterations.

coverage = summary['coverage']
coverage_rows = []
for facet, values in coverage.items():
    if not values:
        coverage_rows.append({'facet': facet, 'values': 'none'})
    else:
        coverage_rows.append({
            'facet': facet,
            'values': ', '.join(f'{k}' for k in list(values.keys())[:18]),
        })
coverage_df = pd.DataFrame(coverage_rows)
print(coverage_df.to_string(index=False, max_colwidth=120))
             facet                                                                                                                   values
        cell_types                                                                                              Bergmann, Granule, Purkinje
             genes                                                                            Aldoc, Aqp4, Calb1, Cdh22, Gabra6, Pcp2, Reln
           markers BRD4, CBP, CPSF6, Fibrillarin, H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K9me3, H4K20me3, HP1alpha, LaminB1, MajSat, Ma...
        modalities                                                              cell_metadata, chromatin_tracing, if_tracks, rna_expression
parameter_families                      expression_association, inter_chromosomal, marker_stratification, radial_position, spatial_distance

5. Plan graph-aware follow-up directions for the next iteration

plan_discovery_directions() uses the graph and h5cd-backed schema to produce diverse, non-duplicative follow-up direction prompts for the next idea agents. The selected set deliberately includes more than one direction type: evidence extension, cross-modal bridge, coverage gap, and negative-result refinement when available.

directions = plan_discovery_directions(graph, schema, max_directions=8)
direction_paths = write_direction_artifacts(directions, GRAPH_OUTPUT_DIR)

direction_df = pd.DataFrame([{
    'direction_id': f.direction_id,
    'type': f.direction_type,
    'priority': f.priority,
    'title': f.title,
    'parents': ', '.join(f.parent_idea_ids) or 'none',
} for f in directions])

print('follow-up direction artifacts:')
for key, value in direction_paths.items():
    print(f'  {key}: {value}')
print('\nnext follow-up directions:')
print(direction_df.to_string(index=False, max_colwidth=70))
direction artifacts:
  directions_json: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/iterative_2x20_real/directions/next_directions.json
  directions_markdown: /Users/weizexu/Projects/U-Chrom/tmp/takei_auto_discovery_doc/iterative_2x20_real/directions/next_directions.md

next directions:
                  direction_id               type  priority                                                                  title                                                     parents
evidence_extension-932b464fa7 evidence_extension      88.0 Extend supported signal: Active chromatin assortativity between chr... active-chromatin-assortativity-between-chromosom-6fcdb55564
cross_modal_bridge-480d651e1b cross_modal_bridge      74.0                    Link RNA expression to chromatin tracing phenotypes                                                        none
      coverage_gap-6fd9b4c49b       coverage_gap      68.0                                        Explore under-tested IF markers                                                        none
evidence_extension-4dc0261d07 evidence_extension      88.0 Extend supported signal: Bergmann-cell lamina-associated telomere p... bergmann-cell-lamina-associated-telomere-positio-408336e9bb
evidence_extension-472214bcc6 evidence_extension      88.0 Extend supported signal: Chromosome-specific radial address strengt... chromosome-specific-radial-address-strength-acro-382d9561f1
evidence_extension-436e49b8f2 evidence_extension      88.0 Extend supported signal: Granule-cell HP1alpha heterochromatin clus... granule-cell-hp1alpha-heterochromatin-clustering-3401a4f59b
evidence_extension-0b5fb1e4f1 evidence_extension      88.0 Extend supported signal: Lamina-proximal local compaction across ch... lamina-proximal-local-compaction-across-chromoso-ee45672874
evidence_extension-828a97f053 evidence_extension      88.0 Extend supported signal: Pcp2 expression should align with per-cell... pcp2-expression-should-align-with-per-cell-h3k27-b0193ea9a5

6. Follow-up direction prompt context for the next idea agents

This Markdown is the handoff that a scheduler can give to the next wave of idea agents together with schema_context.md and idea_graph.json.

direction_context = directions_to_agent_context(directions)
print(direction_context[:5000])
if len(direction_context) > 5000:
    print('\n... truncated in notebook preview ...')
# Graph-derived discovery directions

## 1. Extend supported signal: Active chromatin assortativity between chromosomes

- direction_id: `evidence_extension-932b464fa7`
- type: `evidence_extension`
- priority: 88.0
- parent_idea_ids: active-chromatin-assortativity-between-chromosom-6fcdb55564
- target_facets: `{"cell_types": ["Bergmann", "Granule", "Purkinje"], "genes": [], "markers": ["H3K27ac"], "modalities": ["cell_metadata", "chromatin_tracing", "if_tracks"], "parameter_family": "inter_chromosomal"}`

**Rationale.** Prior notebook evidence for `Active chromatin assortativity between chromosomes` was classified as Supported; p=0.001996007984031936, effect=0.16533794015569558.

**Prompt.** Generate ideas that test whether this previously observed signal remains stable under a different cell type, marker, gene-expression bridge, chromosome subset, or normalization. Keep the parent signal recognizable, but do not repeat the same computable parameter.

**Novelty notes.**
- Must change at least one of cell type, marker/gene, or parameter family.
- Prefer an orthogonal validation or a negative-control contrast.

**Constraints.**
- The final idea must be schema-valid against the h5cd-backed discovery schema.
- The notebook agent must run code in a notebook, include a statistical test, and report p-value/effect size.
- The analysis should include a control, permutation, bootstrap, or matched comparison when feasible.

## 2. Link RNA expression to chromatin tracing phenotypes

- direction_id: `cross_modal_bridge-480d651e1b`
- type: `cross_modal_bridge`
- priority: 74.0
- parent_idea_ids: none
- target_facets: `{"genes": ["Drd3", "Eomes", "Ephb2", "Foxj1", "Gpr176"], "markers": ["CPSF6", "ATRX", "H4K8ac", "HDAC2", "H3K9ac", "H3K9me3", "H3K9me2", "RNAPIISer2-P"], "modalities": ["chromatin_tracing", "rna_expression", "if_tracks"]}`

**Rationale.** The h5cd has linked RNA observations, so the next iteration should not only test IF-marker geometry; it should also ask whether cell-level expression predicts spatial chromatin features.

**Prompt.** Generate ideas that use linked_adata gene expression as the cell-level variable and chromatin tracing or IF-track features as spatial responses.  Select genes from the schema, justify why the gene is biologically meaningful for the cell types, and require a permutation or rank-based hypothesis test.

**Novelty notes.**
- Prefer genes not already used by prior ideas.
- Pair expression with a spatial endpoint, not only another expression summary.

**Constraints.**
- The final idea must be schema-valid against the h5cd-backed discovery schema.
- The notebook agent must run code in a notebook, include a statistical test, and report p-value/effect size.
- The analysis should include a control, permutation, bootstrap, or matched comparison when feasible.
- Validate cell_id alignment between cdata.cells and cdata.linked_adata.obs_names.

## 3. Explore under-tested IF markers

- direction_id: `coverage_gap-6fd9b4c49b`
- type: `coverage_gap`
- priority: 68.0
- parent_idea_ids: none
- target_facets: `{"markers": ["ATRX", "H4K8ac", "HDAC2", "H3K9ac", "H3K9me2"], "modalities": ["chromatin_tracing", "if_tracks"]}`

**Rationale.** The graph has little or no tested coverage for these IF markers despite their presence in the h5cd-backed schema.

**Prompt.** Generate diverse ideas around under-tested IF markers: ATRX, H4K8ac, HDAC2, H3K9ac, H3K9me2. Combine them with available chromatin tracing geometry and, when relevant, IF tracks or linked RNA. Avoid repeating existing idea signatures in the graph.

**Novelty notes.**
- Use the graph coverage counts to avoid already explored facets.
- Prefer combinations that add a new modality or parameter family.

**Constraints.**
- The final idea must be schema-valid against the h5cd-backed discovery schema.
- The notebook agent must run code in a notebook, include a statistical test, and report p-value/effect size.
- The analysis should include a control, permutation, bootstrap, or matched comparison when feasible.

## 4. Extend supported signal: Bergmann-cell lamina-associated telomere positioning

- direction_id: `evidence_extension-4dc0261d07`
- type: `evidence_extension`
- priority: 88.0
- parent_idea_ids: bergmann-cell-lamina-associated-telomere-positio-408336e9bb
- target_facets: `{"cell_types": ["Bergmann", "Granule", "Purkinje"], "genes": [], "markers": ["LaminB1", "Telomere", "n_per_dist(um)"], "modalities": ["cell_metadata", "chromatin_tracing", "if_tracks"], "parameter_family": "radial_position"}`

**Rationale.** Prior notebook evidence for `Bergmann-cell lamina-associated telomere positioning` was classified as Supported; p=0.007984031936127744, effect=0.13718063336440595.

**Prompt.** Generate ideas that test whether this previously observed signal remains stable under a different cell type, marker, gene-expression bridge, chromosome subset, or normalization. Keep the parent signal recognizable, but do not repeat the same computable parameter.



... truncated in notebook preview ...

7. Quick visual summary

The plot below is generated directly from the combined evidence graph and next-direction plan.

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

status_counts = evidence_df['hypothesis_status'].value_counts().reindex(
    ['Supported', 'Borderline', 'Contradicted', 'Not supported', 'Inconclusive'],
    fill_value=0,
)
status_counts.plot(kind='bar', ax=axes[0], color=['#2ca25f', '#feb24c', '#de2d26', '#756bb1', '#969696'])
axes[0].set_title('Combined evidence')
axes[0].set_ylabel('idea count')
axes[0].tick_params(axis='x', rotation=35)

direction_counts = direction_df['type'].value_counts()
direction_counts.plot(kind='bar', ax=axes[1], color='#3182bd')
axes[1].set_title('Next iteration direction types')
axes[1].set_ylabel('direction count')
axes[1].tick_params(axis='x', rotation=35)

fig.tight_layout()
plt.show()
/var/folders/tq/285915z105g568z0ss3ll7_w0000gn/T/ipykernel_49399/3600033073.py:19: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
  plt.show()

8. How to run another iteration

The scheduler can now launch another round by giving each idea agent three inputs:

  • the h5cd-backed discovery schema / schema_context.md;

  • the combined graph/idea_graph.json, which records previous ideas and evidence;

  • directions/next_directions.md, which specifies diverse next search directions.

The Pantheon idea-agent dispatcher uses those graph and follow-up direction records to avoid simple duplication and to create child ideas with metadata.parent_idea_ids or metadata.source_claim_ids. Accepted child ideas are then routed to notebook agents in parallel and folded back into the graph after verification.