Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Bases Feature Store

Bases is an entity-centric feature store backed by Apache Kudu (via PostgreSQL FDW) with a fluent query API, BFO ontology grounding, and query guardrails. It abstracts multiple storage backends behind a unified interface.

Core Concepts

A Base is a named, typed view over features and entities. Bases hide the underlying storage backend (PostgreSQL, Iceberg, Kudu FDW) behind a consistent query interface.

Three base types determine query semantics and backend routing:

TypeSemanticsBackend
SNAPSHOTLatest value per entityKudu via FDW (PostgreSQL stub)
HISTORICALEvent-sourced with time-travelApache Iceberg
REGISTRYMetadata queriesPostgreSQL

Fluent Query API

The primary query interface uses Kudu SDK-style method chaining:

from gaius.bases import Base, col, term

results = await (
    Base("events")
    .where(col("age") > 30)
    .where(col("status").isin("active", "pending"))
    .select("name", "email")
    .order_by("created_at", desc=True)
    .limit(100)
    .scan()
)

Ontology-grounded queries resolve BFO terms to column names via the base’s @context:

results = await (
    Base("events")
    .where(term("BFO:material_entity") == "ENT-12345")
    .scan()
)

Time-travel queries on historical bases:

results = await (
    Base("events")
    .as_of("2026-01-01T00:00:00Z")
    .where(col("entity_id") == "user-42")
    .scan()
)

Base Definition (.base YAML)

Bases are defined in YAML files with JSON-LD style semantic grounding:

"@context":
  "@vocab": "https://purl.obolibrary.org/obo/"
  entity_id:
    "@id": "BFO_0000040"

kudu:
  table: "gaius.events"
  primary_key: [entity_id, event_time]

schema:
  - name: entity_id
    type: STRING
  - name: event_time
    type: TIMESTAMP

Query Guardrails

All queries pass through guardrails that enforce resource limits:

GuardrailDefaultMaximum
Result limit1,000 rows10,000 rows
Query timeout30 seconds120 seconds
Time range (historical)7 days90 days

Historical bases require a time constraint (.as_of() or time column filter). Unbounded historical scans are rejected.

MCP Tools

ToolOperation
bases_listList available bases with metadata
bases_queryExecute fluent queries against bases
bases_entity_historyGet event-sourced history for an entity
bases_healthCheck service health

Architecture

Fluent API (Base/col/term) ──> Parser ──> Compiler (SQLGlot) ──> Executor
                                              |                      |
                                              v                      v
                                    Guardrail Enforcer         PostgreSQL / Iceberg

The DQL Query Language provides the text-based query syntax parsed by the fluent expression parser.

Guru Meditation Codes

CodeMeaning
#BASES.00000001.NOPOOLDatabase pool not configured
#BASES.00000002.NOICEBERGIceberg catalog unavailable
#FLUENT.00000001.BADASTInvalid query expression
#FLUENT.00000002.UNSAFEOPUnsafe operation attempted