End-to-end pipeline + Meta-Harness + reasoner activation — spec
Status: design of record (2026-06-16). Organizes the next phase. Supersedes the control-plane framing
in meta_harness_boundary.md (the RETE/FSM spine is demoted to harness H₀, see §2). Two reframes
drive it: (a) Meta-Harness (Lee et al. 2026, build/resources/2603.28052.pdf) — optimize the harness
(the program wrapping a frozen executor) via an outer-loop coding-agent proposer over a filesystem of
candidates; (b) the ontology IS the dynamic computational framework — a reasoner-backed (HermiT)
artifact we evolve in situ from domain inputs. Discipline (the lesson that produced this spec): adopt the
form, not the language — every new piece must ground out in concrete, measured computation.
0. The end-to-end flow (the complete workflow — keep this in view)
A filesystem-DAG. Each stage writes a dir with manifest.json + a run_id that hashes its inputs →
content-addressed lineage (no orchestration engine required; see §5).
S0 INPUT finepdfs-lab corpus (/raid/datasets/aegir-corpus-v1/finepdfs-lab/)
S1 COVERAGE ontology_coverage_audit.py → coverage_v1/<run>/ {topic_coverage.parquet,
topic_centroids.npy, manifest} [FinePDFs → topics → gap/borderline/covered]
S2 EVOLVE ◄── THE META-HARNESS. mediate.py (spine + ACP/Grok mint + ContractGate[+reasoner])
│ → evidence/meta_harness/<run>/ {trace, scorecard, candidates.candidate.json}
│ [gap topic → mint construct → gate (DeepOnto+polyglot+R1+novelty+schema+CONSISTENCY)
│ → promote]. THIS stage is what the OUTER LOOP (§2) optimizes.
S2' REVIEW charter: editorial review → promote .candidate → catalog family files (human-in-loop)
S3 DDL/SKOS ddl.py (template_to_table→render_ddl→validate_ddl/polyglot) + build_skos_vocab.py
→ DDL spine + 548-concept SKOS + Atlas rdbms_* projection
S4 CORPUS generate_chapter.py (ontology+DDL → chapters + verifiable JSON + reasoning traces)
→ chapters.parquet + raw.exchange
S5 VERIFY verify_chapters.py → raw.chapter_verification
S6 RELEASE build_atelier_release.py (columns/vocabulary/reference blind benchmark) + HF/GitHub
S7 PRETRAIN train_pretrain.py → byte model
S8 EVAL eval_cells_cta / eval_edge_probe / REALIZATION-CPA (§3c) → column/relational skill
Feedback edges (the loops): S2’s grown ontology → re-run S1 (coverage-close); S5/S8 scores → the OUTER LOOP reward (§2); S1 gaps → S2 targets. The convergence loop = S1→S2→S3→S4→(S7→S8)→back.
1. The two frozen executors (the parallel)
| Meta-Harness (paper) | Our pipeline |
|---|---|
frozen LLM M | frozen Grok (the minting model) AND frozen HermiT (the reasoner) |
evolved harness H (a program) | the generation harness (S2) AND the ontology O (a reasoner-executed program) |
| outer-loop coding-agent proposer | the Meta-Harness loop (§2) |
| reward = task accuracy | R1/coverage-close + consistency + realization-accuracy (Pareto vs cost) |
We optimize TWO artifacts against TWO frozen executors: the harness (around Grok) and the ontology (around HermiT). The harness grows the ontology; the reasoner makes the ontology executable.
2. Meta-Harness outer loop (the FORM)
- A harness
H= a single-file program:run(topic, gate) -> (construct, signals)— builds the mint prompt (contract + topic salient terms + exemplars), calls Grok (frozen), gates, iterates. H₀ = the current inc-1 harness, refactored to one clean program (the RETE/FSM ceremony pruned to a minimal loop; let the proposer re-introduce structure only if it earns reward). - Candidate filesystem
D(the feedback channel):candidates/{NNN}/{harness.py, traces/, scores.json}. Full, uncompressed — NOT the scalar signal vector (the anti-pattern the paper beats). - Proposer
P= a coding agent (Claude Code/Opus, or Grok-as-coder) + a minimal skill (where to write harnesses, how togrep/catprior code+traces, what it may edit). It diagnoses from raw traces and rewrites the harness (local edit → full rewrite). - Eval / reward = run
Hon a SEARCH SET of gap topics → batch on-vs-shuffled R1 / coverage-close- cost (Grok tokens, iters) → Pareto frontier. Proposer never sees the HELD-OUT topic set.
- Loop (Algorithm 1): evaluate initial {H₀,…} → for N iters: P reads
D, proposes k harnesses, interface-validate + evaluate + log → return Pareto frontier; final eval on held-out.
3. HermiT reasoner activation (make the ontology executable — the FORM, not the word)
HermiT is the sound-and-complete deductive KERNEL (hypertableau, full OWL 2 DL): sound = no false
entailments, complete = no missed ones. It is the only formally-guaranteed layer — so consistency,
classification, and realization are ground truth, not proxies (R1, verbalize, Grok, the model are the
heuristic/stochastic shell; HermiT is the arbiter). DeepOnto integrates it natively: Ontology(path, reasoner_type="hermit") (the DEFAULT — already instantiated on every probe_template load, just never
queried) exposes check_consistency(), get_inferred_super_entities/sub_entities(), get_instances().
So activation is calling the loaded reasoner, not wiring one. Three concrete, measured computations:
- (a) Consistency gate [beachhead]. After a construct passes the syntactic gates, HermiT
consistency-checks the cumulative ontology (seed ∪ admitted ∪ candidate). New
ContractGatesignalconsistent; reject if it makesOinconsistent. A deductive check nothing syntactic can do — it’s what keeps the in-situ-evolving ontology a coherent computation. Measure: rejections-for- inconsistency;Oprovably consistent as it grows. - (b) Inferred hierarchy (classification). Coverage/structure read HermiT’s inferred subsumption closure, not the asserted SubClassOf chains.
- (c) Realization-as-CPA [re-homes G-rel]. Map the corpus’s verifiable-JSON rows → an OWL ABox →
HermiT realize → the column/entity types & relations computed by the reasoner. CPA/CTA becomes
inference, not a tiny-model probe (which floored → G-rel descoped). Eval = realization accuracy vs the
held-out
reference.parquet. The relational computation relocates to the reasoner; the model becomes a fast amortization of it, not the thing that must learn it. - Caveats (real): OWL profile — generated complex-class constructs push expressivity; keep near OWL 2
EL only as a SPEED fallback if HermiT slows at batch scale (hypertableau is NEXPTIME-worst but
practically tractable on modular BFO/CCO ontologies); the ABox bridge (DDL/JSON → assertions) for
realization is a genuine new pipeline piece. Reasoner already instantiated by DeepOnto (default
reasoner_type="hermit") — activation = callingcheck_consistency()/get_instances(), not new wiring.
4. How they compose
The outer loop (§2) optimizes the harness that grows the ontology; the reasoner (§3) makes the ontology self-consistent and executable; the end-to-end DAG (§0) is where both live. One sentence: a coding agent evolves the program that grows a reasoner-backed, domain-adaptive ontology, judged by what the reasoner and the corpus compute.
5. Orchestration stance (the Airflow question)
- Now: the filesystem-DAG (§0) + thin drivers (
justrecipes + small Python runners) +manifest.json/run_idcontent-addressed lineage. This carries the whole-workflow understanding (legibility) without runtime complexity, and matches the Meta-Harness grain (filesystem + agent, not a DAG engine). The candidate filesystemD(§2) is the same substrate. - NOT Airflow now: it’s a runtime orchestrator for stable/recurrent/scheduled flows; ours is in flux, and Airflow’s scheduler/DB/webserver ceremony would ossify a flow we’re still discovering — and over-orchestrate the part the proposer should navigate.
- Later (convergence-loop maturity): a lightweight orchestrator — Metaflow (the Gaius precedent) or
OpenLineage→Atlas (the project’s existing provenance direction) — when S1→S2→…→S8 runs recurrently and
lineage/scheduling pays off.
components/(cldr/signals) holds Airflow if we ever need it; default no.
6. Increment ladder
- inc-2a (beachhead): HermiT consistency gate in
ContractGate(consistentsignal over the cumulative ontology) + a seed rule. Smallest real reasoner computation; immediately makesOcoherent. - inc-2b: H₀-clean — refactor the spine+mint+gate into a single-file harness program with a
run(topic, gate)interface; stand upcandidates/{NNN}/+ interface validation. - inc-2c: the Meta-Harness outer loop — proposer + minimal skill + search/eval/Pareto over the candidate filesystem; reward = batch R1/coverage-close vs cost on the search set.
- inc-2d: realization-as-CPA — the ABox bridge + HermiT realize + the symbolic-CPA eval vs the held-out reference (G-rel re-homed).
7. Reward / decision rules (measurement, so this stays form not language)
- Harness search reward: batch on-vs-shuffled R1 / coverage-close on the search set, Pareto vs Grok cost; a discovered harness must beat H₀’s frontier on held-out topics to be adopted.
- Reasoner: consistency-gate must reject ≥1 genuinely-inconsistent construct (instrument validity) and
keep
Oconsistent as it grows; realization-CPA valid iff accuracy > control on the v0.3 backbone-free symbolic path, CI-clean vs the held-out reference. - Every increment chains to one of these numbers or it does not ship (the standing rule).
Verification
- Control plane unchanged where reused; H₀ run reproduces inc-1 (t124 R1 ≈0.39, promote).
- inc-2a: consistency gate rejects a hand-crafted inconsistent construct; passes the t124 construct;
Ostays consistent across a batch. 3. inc-2c: a discovered harness beats H₀ on held-out coverage-close. - inc-2d: realization-CPA selectivity CI-clean vs reference. Artifacts under
evidence/per stage.