Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Ontology Authors Guide

This guide is written for an ontology engineer who wants to extend the sdg ontology independently — add classes, definitions, roles, properties — and have a reasonable expectation of passing every quantitative and qualitative gate, and ideally improving on them. It documents the full metric suite, the formal quality gate, the disposal membranes, and the pre-registered objectives, with the exact formulas, bands, and thresholds the tooling enforces. Nothing here is aspirational: every number is the value the code checks.

The governing principle is propose / dispose. You (or an agent) propose axioms; a stack of deterministic membranes disposes — admitting only what is well-formed, logically consistent under the reasoner, and ontologically clean. Rigor is enforced, not asserted, and the two strongest membranes (HermiT and OntoClean) are un-fakeable: you cannot talk your way past a contradiction or an anti-rigidity violation. If your extension passes, it is genuinely rigorous; if it fails, the gate returns the reason and you refine.


1. What you are extending

sdg-ontology is a BFO 2020 / CCO-grounded domain ontology, content-derived from FinePDFs and realized to a HermiT-validated OWL artifact at corpora/ontology/sdg-ontology.{omn,owl} with a consistency certificate at corpora/ontology/HERMIT_CERTIFICATE.md.

Its purpose is to be the annotation vocabulary for Column Type / Column Property Annotation (CTA/CPA) over wide relational tables. That reframes what most of the classes are: they are not leaf terms but intermediate-depth subsumers — the property-bearing classes a heterogeneous-but- coherent column belongs to. A driver_stops_schedule.stops_addresses column holds a mix (origin + destination, residential + business shipping addresses, each bearing an avg-time-on-site); no leaf type fits — the right annotation is the least common subsumer that is still property-bearing, e.g. Address ⊓ ∃has-shipping-role ⊓ ∃avg-time-on-site. Defining these intermediate classes well is building the annotation vocabulary. When you extend the ontology, you are extending that vocabulary, and the gates exist to keep every term a coherent, grounded annotation target.

Namespaces

prefixIRI baseuse
bfo:http://purl.obolibrary.org/obo/BFO_upper categories (numeric IRIs, e.g. bfo:0000040)
cco:https://www.commoncoreontologies.org/mid-level genera (numeric IRIs, e.g. cco:ont00000713 = Vehicle) — note https
fhir:http://hl7.org/fhir/clinical/record types, bridged to cco:InformationContentEntity
iao:http://purl.obolibrary.org/obo/IAO_annotation properties (iao:0000115 = definition)
sdg:https://signals360.example.org/sdg#our own classes/properties
skos:, rdfs:, owl:standardlabels, definitions, structure

BFO categories you will reach for

bfo:0000040 material entity · bfo:0000004 independent continuant · bfo:0000002 continuant · bfo:0000015 process · bfo:0000031 generically dependent continuant (ICE) · bfo:0000019 quality · bfo:0000020 specifically dependent continuant · bfo:0000023 role · bfo:0000016 disposition · bfo:0000034 function · bfo:0000017 realizable entity. Realizable-machinery properties: bfo:0000055 realizes · bfo:0000052 inheres-in · bfo:0000053 bearer-of · bfo:0000054 realized-in.


2. How you author

Classes are authored as catalog templates — a Manchester-syntax skeleton with typed slots — that the realizer renders, grounds, and validates into the OWL artifact. The seven family JSON files live in src/aegir/ontology/catalog/; FinePDFs-derived intermediate classes accrete in 08_derived.json. Edit the family .json files, never combined.json (regenerated).

The slot DSL

{name:Type}                  e.g. {X:Class}, {p:ObjectProperty}, {Y:Class}
{name:Type:Bound}            subtype constraint: {X:Class:bfo:Continuant}

Type ∈ {Class, ObjectProperty, DataProperty, Individual}. A CatalogTemplate carries manchester_template, slot_types, verbal_template (an NL gloss → becomes the definition annotation), bfo_anchor_path, and provenance. Three canonical shapes:

Class: {X:Class} SubClassOf: {Y:Class}                              # primitive (a kind, undefined)
Class: {X:Class} SubClassOf: {p:ObjectProperty} some {Y:Class}      # existential restriction
Class: {X:Class} EquivalentTo: {Y:Class} and {p:ObjectProperty} some {Z:Class}   # DEFINED (genus + differentia)

Manchester conventions the membranes enforce

  1. Prefixes are lowercase onlycco: bfo: fhir: sdg:, never CCO:/BFO:.
  2. Every property is prefixedsdg:hasMeasurement some X, never bare hasMeasurement.
  3. Coined classes/properties use sdg: (camelCase) — do not invent cco:/bfo: names; those are numeric IRIs you must look up (see §3). No # comments (they break the OMN parser).
  4. The genus must be a broader class — a real BFO/CCO/sdg parent, never the class itself.

EquivalentTo vs SubClassOf — the single most important authoring choice

A class with SubClassOf: is primitive (necessary conditions only). A class with EquivalentTo: genus and differentia is defined (necessary and sufficient): anything that is the genus and satisfies the differentia is an instance. Prefer EquivalentTo wherever the differentia are genuinely sufficient — this is the definitional_completeness lever and the IOF discipline (the IOF defines ~55% of its terms this way). Do not force it: a genuine natural kind whose essence is not captured by the stated relations should stay primitive. Reserve EquivalentTo for kinds; model roles with the realizable pattern (§3), not as defined subclasses.

Grounding: choosing a genus

Every class must chain to a BFO category — directly, or through CCO/FHIR. Look up the real IRI with the grounding-anchor retriever rather than inventing one:

uv run --no-sync python scripts/grounding_anchors.py query "shipping address"
#   0.74 [cco]  Mailing Address     cco:ont00000xxx
#   0.59 [fhir] Address             fhir:Address
#   0.55 [sdg]  StopLocation        sdg:StopLocation   (reuse our own)

The index spans CCO (1431 BFO-aligned genera), FHIR R5 (210 record types, bridged to cco:InformationContentEntity), and our own grounded classes (the index accretes — each class you ground becomes a reusable anchor). Prefer, in order: an existing sdg: class (reuse), a CCO/FHIR genus, then a bare BFO category as a last resort. A generic bfo:0000040 placeholder where you meant “Patient” is grounded but shallow — find the real genus.

Roles and the realizable machinery

A role is anti-rigid and relational (supplier/operator/origin-address: the bearer could stop being it and still exist). Model it as a BFO role, never a rigid subclass:

Class: {OperatorRole:Class} SubClassOf: bfo:0000023,
   bfo:0000052 some {Operator:Class}, bfo:0000054 some {OperationProcess:Class}

The inheres-in (bfo:0000052) and realized-in (bfo:0000054) restrictions are what the realizable_machinery metric counts and what BFO discipline requires.


3. The quantitative metrics

All metrics are computed by scripts/ontology_metrology.py::compute() (pure rdflib, JVM-free) over the realized .owl, with CCO’s subClassOf backbone merged so that cco:-grounded chains resolve to BFO. Run:

uv run --no-sync python scripts/ontology_metrology.py corpora/ontology/sdg-ontology.owl   # or --json

n = number of sdg: named classes. Each metric below lists its formula, its IOF/field target, and the authoring lever that moves it.

3.1 IOF-derived rigor dimensions (what field-standard suites miss)

metricformulatargetlever
definitional_completeness‖{c : c owl:equivalentClass …}‖ / nIOF ≈ 0.55write EquivalentTo (genus+differentia) for definable kinds
bfo_grounded‖{c : subClassOf/≡-genus chain reaches a BFO IRI}‖ / n1.0ground every class to a BFO/CCO/FHIR genus
realizable_machinerycount of restrictions on realizes/inheres/bearer/realized props or some role/disposition/functionIOF ≥ 14model roles/dispositions/functions with the realizable pattern
def_annotation_coverage‖{c : rdfs:comment ∨ iao:0000115 ∨ skos:definition}‖ / n1.0 (IAO req)supply a verbal_template; the realizer emits iao:0000115 + rdfs:comment

These are the discriminators: an LLM (or a hasty author) recovers taxonomy + existentials (structure) but not sufficiency, full grounding, or BFO role discipline (rigor). They are where the FunctionalAdequacy gate floor lives.

3.2 Field-standard structural metrics (OntoQA / OQuaRE)

metricformulareading
rr relationship richnessn_∃some / (n_subClassOf + n_∃some)non-taxonomic richness; a pure tree → 0
ir inheritance richnessn_subClassOf / nsubclasses per class
ar attribute richnessn_DatatypeProperty / ntyped data attributes per class
aronto axiomatic strength(n_∃some + n_∀only + n_card) / nrestrictions per class
dit depthlongest subClassOf chaintaxonomic depth (more developed = deeper)
tm tangledness (inverted)‖{c : >1 named parent}‖ / nmultiple-inheritance load; lower is better

3.3 OntoClean taxonomic-correctness proxies (un-gameable)

Reasoner-invisible defects that a generic LLM cannot fake (it names meta-properties at ~96% but cannot operationalize them). Computed via the OntoClean classifier (src/aegir/ontology/ontoclean.py).

metricformulatarget
subsumption_cyclesclasses reachable from themselves via subClassOf (OOPS! P06)0 (hard)
ontoclean_violationssubClassOf edges where an anti-rigid (role) parent subsumes a non-anti-rigid (rigid) child0
sibling_disjointnessfraction of same-parent sibling pairs asserted owl:disjointWith (OOPS! P10)→ 1.0
orphan_ratefraction of sdg: classes with no parent (OOPS! P04 — islands)→ 0
taxonomic_cleanliness1 − (subsumption_cycles + ontoclean_violations) / n_subClassOf1.0

3.4 Consistency

HermiT over the realized ontology with CCO imported. Consumed by the gate from the certificate (isConsistent: true). Zero unsatisfiable classes is the real bar — an ontology can be isConsistent yet contain unsatisfiable classes (classes that can have no instances); both must be clean for a publish.


4. The OQuaRE quality gate

scripts/ontology_oquare.py is the formal publish gate. OQuaRE (Duque-Ramos et al. 2011) adapts ISO/IEC 25000 (SQuaRE) to ontologies: each metric is normalized to [1,5] against fixed, IOF-anchored bands, then aggregated into six characteristics and one holistic score.

uv run --no-sync python scripts/ontology_oquare.py corpora/ontology/sdg-ontology.owl \
    --certificate corpora/ontology/HERMIT_CERTIFICATE.md

4.1 Normalization bands (fixed a priori — a stable distance-to-IOF)

Piecewise-linear interpolation between breakpoints (value, score), clamped to [1,5]:

metric→1→3→5
definitional_completeness0.000.250.55
bfo_grounded0.500.851.00
realizable_machinery0514
def_annotation_coverage0.000.701.00
rr0.000.250.50
ir0.001.003.00
ar0.000.301.00
aronto0.000.601.50
dit138
tm (inverted)0.500.150.00
consistentinconsistentunknownconsistent

4.2 Characteristics (which metrics feed each)

characteristicconstituent metric scores
Structuralaronto, dit, tm, bfo_grounded, rr
FunctionalAdequacydefinitional_completeness, realizable_machinery, def_annotation_coverage
Reliabilitybfo_grounded, consistent, tm
Operabilitydef_annotation_coverage, rr
Maintainabilitytm, dit, ir
Transferabilitybfo_grounded, def_annotation_coverage

aggregate = mean of the six characteristics.

4.3 The gate (GREEN requires all three)

checkfloor
oquare_aggregate≥ 3.5
functional_adequacy≥ 3.0
hermit_consistent== true

AIM 3.9 — the published OQuaRE class of Brick (3.93) / RealEstateCore (3.91). The FunctionalAdequacy ≥ 3.0 floor is deliberate: it forces definitional rigor and BFO discipline, not structural/grounding gains alone. The gate is wired HARD into aegir.lineup.sync._gate(): sync --push of the ontology is refused below GREEN. You will not publish a regression.


5. The disposal membranes (what rejects your extension, and why)

Your axioms pass through these in order. Each returns a reason, so a failure is a repair instruction, not a dead end (this is the agent-mediated feedback loop; a human author reads the same reasons).

  1. Parse membrane (evolve_rigor.validate_detailed) — renders the axiom standalone and parses it under OWLAPI. Rejects malformed Manchester: uppercase prefixes, bare properties, undeclared entities, # comments. Reason: the parser error or “0 classes.”
  2. Reasoning-authority membrane (build_realized_ontology.consistency_check) — imports CCO and runs HermiT, so your grounding is validated against CCO’s disjointness axioms. A class grounded to a CCO-disjoint or BFO-incompatible genus (e.g. a Plant placed under cco:Vehicle, or a continuant genus where a process is required) is unsatisfiable and rejected. Reason: “genus X is incompatible — re-ground to a compatible parent.” This is un-fakeable.
  3. OntoClean meta-property membrane (src/aegir/ontology/ontoclean.py) — assigns Rigidity / Identity / Unity / Dependence and enforces the OntoClean constraint that an anti-rigid property cannot subsume a rigid one (a role cannot be the parent of a kind). Surfaces as ontoclean_violations. Also un-fakeable — reasoner-invisible yet checkable.

A self-check before you propose:

LD_LIBRARY_PATH=$(pwd)/build/jvm-libs uv run --no-sync python scripts/build_realized_ontology.py --strict-grounding
uv run --no-sync python src/aegir/ontology/ontoclean.py src/aegir/ontology/catalog/08_derived.json
just check-ontology-schema      # TTL parses, labels/definitions present, BFO ancestry, SPARQL totality

6. The pre-registered objectives (EVIDENCE.md)

Two standing objectives define “good enough to publish” and “rigorous”:

  • OQ-Structurebfo_grounded ≥ 0.95def_annotation_coverage ≥ 0.90ar > 0oquare_aggregate ≥ 3.5. Gate: the sync._gate publish gate.
  • OQ-Rigordefinitional_completeness ≥ 0.45realizable_machinery > 0. Gate: the OQuaRE FunctionalAdequacy ≥ 3.0 floor.

An extension that holds or raises both objectives is the bar to clear. The standing rule: no sync --push of the ontology Data Product until OQuaRE is GREEN.


7. Worked example — authoring an intermediate class end-to-end

Goal: a class for the stops_addresses column — shipping addresses (origin + destination) bearing an avg-time-on-site.

(1) Decide the modeling. A kind (an Address is rigidly an address) → define it with EquivalentTo. The shipping/origin/destination facet is a role the address bears, not a rigid parent — so it enters as a realizes-style differentia, keeping the genus an Address.

(2) Ground the genus. grounding_anchors.py query "mailing address" → reuse sdg:PostalAddress if present, else cco:ont… (Mailing Address). Coin sdg: only for genuinely new differentia.

(3) Author.

Class: {ShippingStopAddress:Class} EquivalentTo:
   cco:ont00000xxx
   and sdg:bearsShippingRole some {ShippingRole:Class}
   and sdg:hasAverageTimeOnSite some xsd:duration
Annotations: rdfs:label "shipping stop address",
   iao:0000115 "A mailing address that bears a shipping role (origin or destination) on a driver
   stop schedule and has an associated average time on site."

(4) Dispose. Realize → HermiT (the genus cco:…Address is a material/ICE entity; no disjointness violated → satisfiable). OntoClean (the genus is a rigid kind, not a role → no violation). Parse (prefixes lowercase, properties prefixed → admitted).

(5) Measure. Re-run the metrology + OQuaRE gate. This class raises definitional_completeness (an ), holds bfo_grounded (real CCO genus), adds def_annotation_coverage (the iao:0000115), and — because the role is modeled with a realizable differentia — nudges realizable_machinery.

(6) Iterate. If HermiT marks it unsatisfiable, the reason names the offending genus; pick a compatible one and re-dispose. If ontoclean_violations rises, you placed a role as a rigid parent — re-model it as a borne role.


8. How to improve upon the gates

Passing is the floor; the AIM is 3.9 and the IOF frontier beyond it. To raise each lever:

  • Definitional completeness toward 0.55+ — convert primitive kinds to EquivalentTo wherever the differentia are sufficient; define the referenced intermediate classes (the subsumers a column needs), not just the heads. This is the highest-leverage dimension and the one shallow extensions miss.
  • Realizable machinery toward 14+ — wherever a relational/anti-rigid concept appears, model it as a BFO role/disposition/function with inheres/realizes differentiae rather than a subclass.
  • OntoClean to a clean sheet — push sibling_disjointness up (assert disjointWith between identity-incompatible siblings) and keep ontoclean_violations/subsumption_cycles at 0. These are the un-gameable signals; a clean OntoClean profile is the field’s blind spot and your differentiator.
  • Annotation rigor — supply genus-differentia definitions (not vacuous label-glosses); the iao:0000115 should be a real sufficient definition, mirroring the EquivalentTo.
  • Contribute patterns — recurring genus-differentia or role shapes belong in src/aegir/ontology/axiom_patterns.json (DOSDP-style: the defining axiom lives in the pattern, you fill slots). Reserve equivalentClass for kinds; emit roles via the realizable pattern — do not conflate them (Neuhaus 2025: roles resist ).

The two reasoners are the discipline you cannot circumvent: HermiT rejects any grounding that contradicts CCO’s disjointness, and OntoClean rejects any anti-rigid-over-rigid subsumption. Build with them, not around them, and your extension is rigorous by construction.


9. Reference — commands & files

# metrics + gate
uv run --no-sync python scripts/ontology_metrology.py corpora/ontology/sdg-ontology.owl [--json]
uv run --no-sync python scripts/ontology_oquare.py corpora/ontology/sdg-ontology.owl \
    --certificate corpora/ontology/HERMIT_CERTIFICATE.md [--json]
# membranes / realize (LD_LIBRARY_PATH bootstraps the JVM for HermiT/DeepOnto)
LD_LIBRARY_PATH=$(pwd)/build/jvm-libs uv run --no-sync python scripts/build_realized_ontology.py --strict-grounding
uv run --no-sync python src/aegir/ontology/ontoclean.py src/aegir/ontology/catalog/08_derived.json
just check-ontology-schema
# grounding + agent-assisted authoring
uv run --no-sync python scripts/grounding_anchors.py query "<concept>"
LD_LIBRARY_PATH=$(pwd)/build/jvm-libs uv run --no-sync python scripts/define_intermediate_classes.py --rounds 4
uv run --no-sync python scripts/evolve_rigor.py --batch 12       # convert primitives → ≡ / roles
filerole
src/aegir/ontology/catalog/*.jsonthe seven family catalogs (edit these, not combined.json)
src/aegir/ontology/SLOT_DSL.mdthe slot grammar
src/aegir/ontology/axiom_patterns.jsonDOSDP-style genus-differentia + role patterns
scripts/ontology_metrology.pyevery metric (compute()) — the single source of truth
scripts/ontology_oquare.pythe [1,5] bands, characteristic map, FLOORS, the publish gate
src/aegir/ontology/ontoclean.pythe OntoClean classifier + meta-property membrane
scripts/build_realized_ontology.pyrender → CCO import → HermiT → .owl + certificate
scripts/grounding_anchors.pyCCO+FHIR+accretive genus retrieval
corpora/ontology/sdg-ontology.{omn,owl}the realized artifact (+ HERMIT_CERTIFICATE.md)

Citations. OQuaRE: Duque-Ramos et al. 2011. IOF/BFO signature: Smith et al. 2019. OntoClean: Guarino & Welty. CCO: Common Core Ontologies (CC0). FHIR: HL7 FHIR R5.