Ontology Authors Guide
This guide is written for an ontology engineer who wants to extend the sdg ontology
independently — add classes, definitions, roles, properties — and have a reasonable expectation
of passing every quantitative and qualitative gate, and ideally improving on them. It documents
the full metric suite, the formal quality gate, the disposal membranes, and the pre-registered
objectives, with the exact formulas, bands, and thresholds the tooling enforces. Nothing here is
aspirational: every number is the value the code checks.
The governing principle is propose / dispose. You (or an agent) propose axioms; a stack of deterministic membranes disposes — admitting only what is well-formed, logically consistent under the reasoner, and ontologically clean. Rigor is enforced, not asserted, and the two strongest membranes (HermiT and OntoClean) are un-fakeable: you cannot talk your way past a contradiction or an anti-rigidity violation. If your extension passes, it is genuinely rigorous; if it fails, the gate returns the reason and you refine.
1. What you are extending
sdg-ontology is a BFO 2020 / CCO-grounded domain ontology, content-derived from FinePDFs and
realized to a HermiT-validated OWL artifact at corpora/ontology/sdg-ontology.{omn,owl} with a
consistency certificate at corpora/ontology/HERMIT_CERTIFICATE.md.
Its purpose is to be the annotation vocabulary for Column Type / Column Property Annotation
(CTA/CPA) over wide relational tables. That reframes what most of the classes are: they are not
leaf terms but intermediate-depth subsumers — the property-bearing classes a heterogeneous-but-
coherent column belongs to. A driver_stops_schedule.stops_addresses column holds a mix (origin +
destination, residential + business shipping addresses, each bearing an avg-time-on-site); no leaf
type fits — the right annotation is the least common subsumer that is still property-bearing,
e.g. Address ⊓ ∃has-shipping-role ⊓ ∃avg-time-on-site. Defining these intermediate classes well
is building the annotation vocabulary. When you extend the ontology, you are extending that
vocabulary, and the gates exist to keep every term a coherent, grounded annotation target.
Namespaces
| prefix | IRI base | use |
|---|---|---|
bfo: | http://purl.obolibrary.org/obo/BFO_ | upper categories (numeric IRIs, e.g. bfo:0000040) |
cco: | https://www.commoncoreontologies.org/ | mid-level genera (numeric IRIs, e.g. cco:ont00000713 = Vehicle) — note https |
fhir: | http://hl7.org/fhir/ | clinical/record types, bridged to cco:InformationContentEntity |
iao: | http://purl.obolibrary.org/obo/IAO_ | annotation properties (iao:0000115 = definition) |
sdg: | https://signals360.example.org/sdg# | our own classes/properties |
skos:, rdfs:, owl: | standard | labels, definitions, structure |
BFO categories you will reach for
bfo:0000040 material entity · bfo:0000004 independent continuant · bfo:0000002 continuant ·
bfo:0000015 process · bfo:0000031 generically dependent continuant (ICE) · bfo:0000019 quality ·
bfo:0000020 specifically dependent continuant · bfo:0000023 role · bfo:0000016 disposition ·
bfo:0000034 function · bfo:0000017 realizable entity.
Realizable-machinery properties: bfo:0000055 realizes · bfo:0000052 inheres-in · bfo:0000053
bearer-of · bfo:0000054 realized-in.
2. How you author
Classes are authored as catalog templates — a Manchester-syntax skeleton with typed slots — that
the realizer renders, grounds, and validates into the OWL artifact. The seven family JSON files live
in src/aegir/ontology/catalog/; FinePDFs-derived intermediate classes accrete in
08_derived.json. Edit the family .json files, never combined.json (regenerated).
The slot DSL
{name:Type} e.g. {X:Class}, {p:ObjectProperty}, {Y:Class}
{name:Type:Bound} subtype constraint: {X:Class:bfo:Continuant}
Type ∈ {Class, ObjectProperty, DataProperty, Individual}. A CatalogTemplate carries
manchester_template, slot_types, verbal_template (an NL gloss → becomes the definition
annotation), bfo_anchor_path, and provenance. Three canonical shapes:
Class: {X:Class} SubClassOf: {Y:Class} # primitive (a kind, undefined)
Class: {X:Class} SubClassOf: {p:ObjectProperty} some {Y:Class} # existential restriction
Class: {X:Class} EquivalentTo: {Y:Class} and {p:ObjectProperty} some {Z:Class} # DEFINED (genus + differentia)
Manchester conventions the membranes enforce
- Prefixes are lowercase only —
cco:bfo:fhir:sdg:, neverCCO:/BFO:. - Every property is prefixed —
sdg:hasMeasurement some X, never barehasMeasurement. - Coined classes/properties use
sdg:(camelCase) — do not inventcco:/bfo:names; those are numeric IRIs you must look up (see §3). No#comments (they break the OMN parser). - The genus must be a broader class — a real BFO/CCO/sdg parent, never the class itself.
EquivalentTo vs SubClassOf — the single most important authoring choice
A class with SubClassOf: is primitive (necessary conditions only). A class with
EquivalentTo: genus and differentia is defined (necessary and sufficient): anything that is
the genus and satisfies the differentia is an instance. Prefer EquivalentTo wherever the
differentia are genuinely sufficient — this is the definitional_completeness lever and the IOF
discipline (the IOF defines ~55% of its terms this way). Do not force it: a genuine natural kind
whose essence is not captured by the stated relations should stay primitive. Reserve EquivalentTo
for kinds; model roles with the realizable pattern (§3), not as defined subclasses.
Grounding: choosing a genus
Every class must chain to a BFO category — directly, or through CCO/FHIR. Look up the real IRI with the grounding-anchor retriever rather than inventing one:
uv run --no-sync python scripts/grounding_anchors.py query "shipping address"
# 0.74 [cco] Mailing Address cco:ont00000xxx
# 0.59 [fhir] Address fhir:Address
# 0.55 [sdg] StopLocation sdg:StopLocation (reuse our own)
The index spans CCO (1431 BFO-aligned genera), FHIR R5 (210 record types, bridged to
cco:InformationContentEntity), and our own grounded classes (the index accretes — each class
you ground becomes a reusable anchor). Prefer, in order: an existing sdg: class (reuse), a CCO/FHIR
genus, then a bare BFO category as a last resort. A generic bfo:0000040 placeholder where you meant
“Patient” is grounded but shallow — find the real genus.
Roles and the realizable machinery
A role is anti-rigid and relational (supplier/operator/origin-address: the bearer could stop being it and still exist). Model it as a BFO role, never a rigid subclass:
Class: {OperatorRole:Class} SubClassOf: bfo:0000023,
bfo:0000052 some {Operator:Class}, bfo:0000054 some {OperationProcess:Class}
The inheres-in (bfo:0000052) and realized-in (bfo:0000054) restrictions are what the
realizable_machinery metric counts and what BFO discipline requires.
3. The quantitative metrics
All metrics are computed by scripts/ontology_metrology.py::compute() (pure rdflib, JVM-free) over
the realized .owl, with CCO’s subClassOf backbone merged so that cco:-grounded chains resolve
to BFO. Run:
uv run --no-sync python scripts/ontology_metrology.py corpora/ontology/sdg-ontology.owl # or --json
n = number of sdg: named classes. Each metric below lists its formula, its IOF/field
target, and the authoring lever that moves it.
3.1 IOF-derived rigor dimensions (what field-standard suites miss)
| metric | formula | target | lever |
|---|---|---|---|
definitional_completeness | ‖{c : c owl:equivalentClass …}‖ / n | IOF ≈ 0.55 | write EquivalentTo (genus+differentia) for definable kinds |
bfo_grounded | ‖{c : subClassOf/≡-genus chain reaches a BFO IRI}‖ / n | 1.0 | ground every class to a BFO/CCO/FHIR genus |
realizable_machinery | count of restrictions on realizes/inheres/bearer/realized props or some role/disposition/function | IOF ≥ 14 | model roles/dispositions/functions with the realizable pattern |
def_annotation_coverage | ‖{c : rdfs:comment ∨ iao:0000115 ∨ skos:definition}‖ / n | 1.0 (IAO req) | supply a verbal_template; the realizer emits iao:0000115 + rdfs:comment |
These are the discriminators: an LLM (or a hasty author) recovers taxonomy + existentials
(structure) but not sufficiency, full grounding, or BFO role discipline (rigor). They are where the
FunctionalAdequacy gate floor lives.
3.2 Field-standard structural metrics (OntoQA / OQuaRE)
| metric | formula | reading |
|---|---|---|
rr relationship richness | n_∃some / (n_subClassOf + n_∃some) | non-taxonomic richness; a pure tree → 0 |
ir inheritance richness | n_subClassOf / n | subclasses per class |
ar attribute richness | n_DatatypeProperty / n | typed data attributes per class |
aronto axiomatic strength | (n_∃some + n_∀only + n_card) / n | restrictions per class |
dit depth | longest subClassOf chain | taxonomic depth (more developed = deeper) |
tm tangledness (inverted) | ‖{c : >1 named parent}‖ / n | multiple-inheritance load; lower is better |
3.3 OntoClean taxonomic-correctness proxies (un-gameable)
Reasoner-invisible defects that a generic LLM cannot fake (it names meta-properties at ~96% but
cannot operationalize them). Computed via the OntoClean classifier (src/aegir/ontology/ontoclean.py).
| metric | formula | target |
|---|---|---|
subsumption_cycles | classes reachable from themselves via subClassOf (OOPS! P06) | 0 (hard) |
ontoclean_violations | subClassOf edges where an anti-rigid (role) parent subsumes a non-anti-rigid (rigid) child | 0 |
sibling_disjointness | fraction of same-parent sibling pairs asserted owl:disjointWith (OOPS! P10) | → 1.0 |
orphan_rate | fraction of sdg: classes with no parent (OOPS! P04 — islands) | → 0 |
taxonomic_cleanliness | 1 − (subsumption_cycles + ontoclean_violations) / n_subClassOf | 1.0 |
3.4 Consistency
HermiT over the realized ontology with CCO imported. Consumed by the gate from the certificate
(isConsistent: true). Zero unsatisfiable classes is the real bar — an ontology can be
isConsistent yet contain unsatisfiable classes (classes that can have no instances); both must be
clean for a publish.
4. The OQuaRE quality gate
scripts/ontology_oquare.py is the formal publish gate. OQuaRE (Duque-Ramos et al. 2011) adapts
ISO/IEC 25000 (SQuaRE) to ontologies: each metric is normalized to [1,5] against fixed,
IOF-anchored bands, then aggregated into six characteristics and one holistic score.
uv run --no-sync python scripts/ontology_oquare.py corpora/ontology/sdg-ontology.owl \
--certificate corpora/ontology/HERMIT_CERTIFICATE.md
4.1 Normalization bands (fixed a priori — a stable distance-to-IOF)
Piecewise-linear interpolation between breakpoints (value, score), clamped to [1,5]:
| metric | →1 | →3 | →5 |
|---|---|---|---|
definitional_completeness | 0.00 | 0.25 | 0.55 |
bfo_grounded | 0.50 | 0.85 | 1.00 |
realizable_machinery | 0 | 5 | 14 |
def_annotation_coverage | 0.00 | 0.70 | 1.00 |
rr | 0.00 | 0.25 | 0.50 |
ir | 0.00 | 1.00 | 3.00 |
ar | 0.00 | 0.30 | 1.00 |
aronto | 0.00 | 0.60 | 1.50 |
dit | 1 | 3 | 8 |
tm (inverted) | 0.50 | 0.15 | 0.00 |
consistent | inconsistent | unknown | consistent |
4.2 Characteristics (which metrics feed each)
| characteristic | constituent metric scores |
|---|---|
| Structural | aronto, dit, tm, bfo_grounded, rr |
| FunctionalAdequacy | definitional_completeness, realizable_machinery, def_annotation_coverage |
| Reliability | bfo_grounded, consistent, tm |
| Operability | def_annotation_coverage, rr |
| Maintainability | tm, dit, ir |
| Transferability | bfo_grounded, def_annotation_coverage |
aggregate = mean of the six characteristics.
4.3 The gate (GREEN requires all three)
| check | floor |
|---|---|
oquare_aggregate | ≥ 3.5 |
functional_adequacy | ≥ 3.0 |
hermit_consistent | == true |
AIM 3.9 — the published OQuaRE class of Brick (3.93) / RealEstateCore (3.91). The
FunctionalAdequacy ≥ 3.0 floor is deliberate: it forces definitional rigor and BFO discipline,
not structural/grounding gains alone. The gate is wired HARD into aegir.lineup.sync._gate():
sync --push of the ontology is refused below GREEN. You will not publish a regression.
5. The disposal membranes (what rejects your extension, and why)
Your axioms pass through these in order. Each returns a reason, so a failure is a repair instruction, not a dead end (this is the agent-mediated feedback loop; a human author reads the same reasons).
- Parse membrane (
evolve_rigor.validate_detailed) — renders the axiom standalone and parses it under OWLAPI. Rejects malformed Manchester: uppercase prefixes, bare properties, undeclared entities,#comments. Reason: the parser error or “0 classes.” - Reasoning-authority membrane (
build_realized_ontology.consistency_check) — imports CCO and runs HermiT, so your grounding is validated against CCO’s disjointness axioms. A class grounded to a CCO-disjoint or BFO-incompatible genus (e.g. aPlantplaced undercco:Vehicle, or a continuant genus where a process is required) is unsatisfiable and rejected. Reason: “genus X is incompatible — re-ground to a compatible parent.” This is un-fakeable. - OntoClean meta-property membrane (
src/aegir/ontology/ontoclean.py) — assigns Rigidity / Identity / Unity / Dependence and enforces the OntoClean constraint that an anti-rigid property cannot subsume a rigid one (a role cannot be the parent of a kind). Surfaces asontoclean_violations. Also un-fakeable — reasoner-invisible yet checkable.
A self-check before you propose:
LD_LIBRARY_PATH=$(pwd)/build/jvm-libs uv run --no-sync python scripts/build_realized_ontology.py --strict-grounding
uv run --no-sync python src/aegir/ontology/ontoclean.py src/aegir/ontology/catalog/08_derived.json
just check-ontology-schema # TTL parses, labels/definitions present, BFO ancestry, SPARQL totality
6. The pre-registered objectives (EVIDENCE.md)
Two standing objectives define “good enough to publish” and “rigorous”:
- OQ-Structure —
bfo_grounded ≥ 0.95∧def_annotation_coverage ≥ 0.90∧ar > 0∧oquare_aggregate ≥ 3.5. Gate: thesync._gatepublish gate. - OQ-Rigor —
definitional_completeness ≥ 0.45∧realizable_machinery > 0. Gate: the OQuaREFunctionalAdequacy ≥ 3.0floor.
An extension that holds or raises both objectives is the bar to clear. The standing rule: no
sync --push of the ontology Data Product until OQuaRE is GREEN.
7. Worked example — authoring an intermediate class end-to-end
Goal: a class for the stops_addresses column — shipping addresses (origin + destination) bearing an
avg-time-on-site.
(1) Decide the modeling. A kind (an Address is rigidly an address) → define it with
EquivalentTo. The shipping/origin/destination facet is a role the address bears, not a rigid
parent — so it enters as a realizes-style differentia, keeping the genus an Address.
(2) Ground the genus. grounding_anchors.py query "mailing address" → reuse sdg:PostalAddress
if present, else cco:ont… (Mailing Address). Coin sdg: only for genuinely new differentia.
(3) Author.
Class: {ShippingStopAddress:Class} EquivalentTo:
cco:ont00000xxx
and sdg:bearsShippingRole some {ShippingRole:Class}
and sdg:hasAverageTimeOnSite some xsd:duration
Annotations: rdfs:label "shipping stop address",
iao:0000115 "A mailing address that bears a shipping role (origin or destination) on a driver
stop schedule and has an associated average time on site."
(4) Dispose. Realize → HermiT (the genus cco:…Address is a material/ICE entity; no disjointness
violated → satisfiable). OntoClean (the genus is a rigid kind, not a role → no violation). Parse
(prefixes lowercase, properties prefixed → admitted).
(5) Measure. Re-run the metrology + OQuaRE gate. This class raises definitional_completeness
(an ≡), holds bfo_grounded (real CCO genus), adds def_annotation_coverage (the iao:0000115),
and — because the role is modeled with a realizable differentia — nudges realizable_machinery.
(6) Iterate. If HermiT marks it unsatisfiable, the reason names the offending genus; pick a
compatible one and re-dispose. If ontoclean_violations rises, you placed a role as a rigid parent —
re-model it as a borne role.
8. How to improve upon the gates
Passing is the floor; the AIM is 3.9 and the IOF frontier beyond it. To raise each lever:
- Definitional completeness toward 0.55+ — convert primitive kinds to
EquivalentTowherever the differentia are sufficient; define the referenced intermediate classes (the subsumers a column needs), not just the heads. This is the highest-leverage dimension and the one shallow extensions miss. - Realizable machinery toward 14+ — wherever a relational/anti-rigid concept appears, model it as a
BFO role/disposition/function with
inheres/realizesdifferentiae rather than a subclass. - OntoClean to a clean sheet — push
sibling_disjointnessup (assertdisjointWithbetween identity-incompatible siblings) and keepontoclean_violations/subsumption_cyclesat 0. These are the un-gameable signals; a clean OntoClean profile is the field’s blind spot and your differentiator. - Annotation rigor — supply genus-differentia definitions (not vacuous label-glosses); the
iao:0000115should be a real sufficient definition, mirroring theEquivalentTo. - Contribute patterns — recurring genus-differentia or role shapes belong in
src/aegir/ontology/axiom_patterns.json(DOSDP-style: the defining axiom lives in the pattern, you fill slots). ReserveequivalentClassfor kinds; emit roles via the realizable pattern — do not conflate them (Neuhaus 2025: roles resist≡).
The two reasoners are the discipline you cannot circumvent: HermiT rejects any grounding that contradicts CCO’s disjointness, and OntoClean rejects any anti-rigid-over-rigid subsumption. Build with them, not around them, and your extension is rigorous by construction.
9. Reference — commands & files
# metrics + gate
uv run --no-sync python scripts/ontology_metrology.py corpora/ontology/sdg-ontology.owl [--json]
uv run --no-sync python scripts/ontology_oquare.py corpora/ontology/sdg-ontology.owl \
--certificate corpora/ontology/HERMIT_CERTIFICATE.md [--json]
# membranes / realize (LD_LIBRARY_PATH bootstraps the JVM for HermiT/DeepOnto)
LD_LIBRARY_PATH=$(pwd)/build/jvm-libs uv run --no-sync python scripts/build_realized_ontology.py --strict-grounding
uv run --no-sync python src/aegir/ontology/ontoclean.py src/aegir/ontology/catalog/08_derived.json
just check-ontology-schema
# grounding + agent-assisted authoring
uv run --no-sync python scripts/grounding_anchors.py query "<concept>"
LD_LIBRARY_PATH=$(pwd)/build/jvm-libs uv run --no-sync python scripts/define_intermediate_classes.py --rounds 4
uv run --no-sync python scripts/evolve_rigor.py --batch 12 # convert primitives → ≡ / roles
| file | role |
|---|---|
src/aegir/ontology/catalog/*.json | the seven family catalogs (edit these, not combined.json) |
src/aegir/ontology/SLOT_DSL.md | the slot grammar |
src/aegir/ontology/axiom_patterns.json | DOSDP-style genus-differentia + role patterns |
scripts/ontology_metrology.py | every metric (compute()) — the single source of truth |
scripts/ontology_oquare.py | the [1,5] bands, characteristic map, FLOORS, the publish gate |
src/aegir/ontology/ontoclean.py | the OntoClean classifier + meta-property membrane |
scripts/build_realized_ontology.py | render → CCO import → HermiT → .owl + certificate |
scripts/grounding_anchors.py | CCO+FHIR+accretive genus retrieval |
corpora/ontology/sdg-ontology.{omn,owl} | the realized artifact (+ HERMIT_CERTIFICATE.md) |
Citations. OQuaRE: Duque-Ramos et al. 2011. IOF/BFO signature: Smith et al. 2019. OntoClean: Guarino & Welty. CCO: Common Core Ontologies (CC0). FHIR: HL7 FHIR R5.