Gaius

Gaius is a terminal interface for navigating high-dimensional embedding spaces. It computes persistent homology and Ollivier–Ricci curvature on the original embeddings, projects the results onto a discrete 19×19 lattice via UMAP, and renders topological and geometric features as interactive overlays.

Named after Gaius Plinius Secundus (Pliny the Elder), whose Naturalis Historia cataloged the natural world across 37 books.

Capabilities

Lattice Projection: UMAP (cosine metric, k=15 neighbors, min_dist=0.1) maps embedding vectors to continuous 2D coordinates. These are quantized to a 19×19 integer lattice by rounding and clipping to [0, 18]. The main lattice is accompanied by two 9×9 orthographic mini-grids centered on the cursor: an Embed view showing the local cosine-similarity neighborhood, and an Iso view rendering scalar fields (curvature, total persistence, complexity) as elevation maps via inverse-distance-weighted interpolation (power=2). Complexity is the mean cosine distance to k-nearest neighbors, normalized across the collection — a proxy for local topological isolation.
Persistent Homology (H₀–H₂): Ripser computes a Vietoris–Rips filtration over the cosine distance matrix of the original high-dimensional embeddings (not the projected coordinates), producing persistence barcodes for dimensions 0 through 2. Intervals with persistence > 0.1 are marked significant (a heuristic threshold; no stability analysis is applied). H₀ captures connected components, H₁ captures 1-cycles, and H₂ captures 2-dimensional voids. Barcodes are rendered as overlays on the lattice, with persistent generators mapped to their lattice positions via the UMAP projection.
Ollivier–Ricci Curvature: Discrete Ricci curvature is computed on a k-nearest-neighbor graph (k=15, cosine metric) constructed from the embedding space, using the OTD method with α=0.5. Per-node curvature is the mean of incident edge curvatures. The resulting curvature field, gradient vectors (finite-difference approximation), and divergence values are projected to the Iso mini-grid. Positive curvature indicates regions where neighborhoods overlap (cluster interiors); negative curvature indicates diverging neighborhoods (transition regions between topics).
Multi-Agent Exploration: Seven agents (Leader, Risk, Optimizer, Planner, Critic, Executor, Adversary) navigate the lattice with role-specific positioning behaviors and cluster affinities. Leader seeks cluster centroids (positive curvature regions); Risk positions at semantic boundaries (negative curvature); Adversary samples uniformly. Persistent homology features and Ricci curvature values are available as grid state, directly informing agent trajectory selection and the Planner’s constraint-satisfaction decisions.

Agent training uses the RASE framework (Rapid Agentic Systems Engineering), where constraints are composed declaratively via AllOf/AnyOf/Not and evaluated by a ground-truth oracle to produce verifiable reward signals — not learned proxies.
Modal Interface: Vim-style modal navigation (hjkl motion, slash-command dispatch, overlay toggles) over both the lattice and the underlying gRPC service graph.
FMEA Health Observer: A background daemon scores system components on Severity × Occurrence × Detection. When risk priority numbers exceed configured thresholds, it escalates to an agent via the Agent Client Protocol (ACP) for FMEA-mediated intervention.

Computational Pipeline

The following pipeline is implemented end-to-end:

Embed — Documents are encoded as multi-vector embeddings (ColNomic, GPU-accelerated) and indexed.
Project — UMAP maps the embedding space to 2D; coordinates are rounded to the 19×19 integer lattice.
Filtration — Vietoris–Rips filtration over the cosine distance matrix of original embeddings; Ripser computes persistence barcodes for H₀, H₁, H₂. Significant intervals (persistence > 0.1) produce topological overlays.
Curvature — Ollivier–Ricci curvature on the k-NN graph (k=15, α=0.5, OTD); curvature, gradient, and divergence fields are interpolated onto the 9×9 Iso mini-grid via IDW.
Exploration — Agents operate on the lattice using persistent features and curvature fields as state; the RASE oracle evaluates trajectories against topological invariants to produce verifiable rewards.
Rendering — LuxCore path-traces procedural card visualizations from the computed geometric features.

The lattice serves as both a visualization surface and a discrete approximation of the data manifold, integrating persistent homology, discrete curvature, and agent-based exploration.

Architecture

Inference — gRPC control plane with 37 services coordinating 6 NVIDIA GPUs via OR-Tools CP-SAT constraint programming for priority-preemptive scheduling and makespan optimization across inference, rendering, and evolution workloads
Interfaces — TUI, CLI, and MCP server (163 tools), all communicating with the engine via shared gRPC protocol
Pipelines — Metaflow orchestration for article curation, agent evaluation, and batch rendering
Visualization — LuxCore PATHOCL engine with GPU-accelerated rendering driven by a CFDG-inspired grammar
Observability — FMEA-scored health observer with ACP-mediated agent intervention
Storage — Bases feature store with a domain query language compiled to SQL via AST-based guardrails; RASE metamodel for agent verification

Getting Started

# Launch the TUI
uv run gaius

# Use the CLI for scripting
uv run gaius-cli --cmd "/health" --format json

# Check system status
uv run gaius-cli --cmd "/gpu status" --format json

Navigate with hjkl. Cycle overlays with o. Toggle modes with v. Press ? for help.

Vision & Philosophy

The Polymath’s Dilemma

Modern knowledge work demands synthesis across domains. A pension analyst must understand markets, demographics, regulation, and behavioral economics—simultaneously. A systems architect must hold network topology, security surfaces, performance characteristics, and team dynamics in mind as a unified whole.

Yet our tools present information in fragments. Spreadsheets. Dashboards. Slide decks. Chat interfaces. Each offers a narrow aperture onto a high-dimensional reality.

Gaius proposes a different approach: spatial synthesis. By projecting complex relationships onto a navigable grid, it transforms abstract complexity into something the human visual system can grasp intuitively—patterns, clusters, voids, and flows.

Why a Grid?

The 19×19 Go board is not arbitrary. It represents a sweet spot in human visual cognition:

361 points: Enough resolution for meaningful differentiation, few enough for gestalt perception
Addressable: Every point has a name (A1 through T19), enabling precise reference
Compositional: Regions, groups, and territories emerge naturally from point relationships
Battle-tested: 4,000 years of Go strategy have proven this grid’s capacity to represent complex strategic landscapes

The grid constrains—and constraint enables clarity. A 19×19 board forces prioritization. What matters enough to occupy space?

Topological Intuition

Raw data has shape. Clusters form. Loops persist. Voids signal absence. Traditional visualization obscures this topology behind axes, legends, and chart types.

Persistent homology offers a different lens. It asks: what structures survive as we vary our perspective? The resulting “death loops” (H1 features) reveal cycles in your data—feedback loops, circular dependencies, systemic risks—that persist across scales.

When projected onto the grid, these become visible warnings: regions to investigate, patterns to understand, risks to mitigate.

Agentic Amplification

A single human perspective is insufficient for complex domains. Gaius deploys autonomous agents that explore, evolve, and consolidate knowledge. Each agent brings a distinct analytical lens, and their capabilities improve through RLVR (Reinforcement Learning with Verifiable Reward) training.

Agent outputs are embedded and projected onto the grid. Watch agents converge on consensus. Notice where they scatter (uncertainty). Observe who stands alone (contrarian insight). The grid becomes a map of collective intelligence.

Design Principles

1. Keyboard-First

Every action available via keyboard. Mouse optional. This isn’t nostalgia—it’s recognition that flow state requires low-latency, high-bandwidth input.

2. Progressive Disclosure

Launch with uv run gaius and get a clean TUI instantly. Three interfaces — TUI, CLI, MCP — offer increasing levels of automation. Complexity arrives when requested.

Modes aren’t complexity—they’re context. Navigate in normal mode. Enter commands in command mode. Each mode offers a focused set of operations.

4. Composability

Each component (board, log, overlay) is independent. Combine them. Split them. Tile them. The interface adapts to your workflow.

5. Transparency

No magic. The grid shows exactly what it’s told to show. Overlays are explicit. Agent positions reflect actual embeddings. Trust requires transparency.

The Goal

Gaius aims to demonstrate that terminal interfaces need not be constrained to text streams. That topological insight can be made visual. That agent augmentation can be made spatial.

It’s an experiment in augmented cognition—using machines not to replace human judgment, but to extend human perception into domains our unaided senses cannot reach.

Core Concepts

Gaius integrates several conceptual pillars: spatial representation, topological analysis, autonomous agents, and self-healing infrastructure. This section introduces the foundational ideas; subsequent chapters explore each in depth.

The Grid

At the center of Gaius is a 19x19 board. This isn’t a chart or a dashboard — it’s a canvas for projection.

High-dimensional data (embeddings, agent states, risk surfaces) gets compressed onto 361 addressable points. The compression is lossy by design: it forces salience. What survives projection is what matters.

The grid supports multiple visualization modes:

Point markers: Individual data points as stones
Density heatmaps: Aggregate intensity via shading
Topology overlays: Death loops and persistent features
Agent positions: Agent state projected from embedding space

See The Grid Metaphor for the full treatment.

Embeddings

Modern ML represents entities as vectors in high-dimensional space. Text, images, users, documents — all become points in a geometric landscape where distance encodes similarity.

Gaius consumes these embeddings directly. Agent utterances become vectors. Domain entities become vectors. Cards, articles, and knowledge base entries occupy positions in embedding space. The relationships between them — cosine similarities, clusters, outliers — become spatial relationships on the grid.

See Embeddings & Point Clouds for details on how Gaius handles vector representations.

Persistent Homology

Traditional statistics describe data’s distribution. Topology describes its shape.

Persistent homology asks: as we vary the scale of observation, what features persist?

H0 features (connected components): Clusters that remain distinct
H1 features (loops): Cycles that don’t collapse — the “death loops”
H2 features (voids): Empty regions bounded by surfaces

These topological features often reveal structure invisible to statistical methods: feedback loops in systems, circular dependencies in code, liquidity traps in markets.

See Persistent Homology for the mathematical foundations and practical applications.

Autonomous Agents

Gaius agents are not static analyzers — they evolve. Through RLVR (Reinforcement Learning with Verifiable Reward) training, agents improve their capabilities over time. The agent system includes:

Evolution: Task ideation, training runs, and capability evaluation
Cognition: Self-observation and action planning
Theta consolidation: Memory compression inspired by hippocampal replay
CLT memory: Cognitive Load Theory-based knowledge structuring

See Agent System for implementation details.

Self-Healing

Gaius implements autonomous health monitoring based on FMEA (Failure Mode and Effects Analysis). Every failure mode has:

A Guru Meditation Code for unique identification (e.g., #DS.00000001.SVCNOTINIT)
An automated fix strategy that can diagnose, repair, and verify
An escalation path to ACP (Agent Client Protocol) when self-healing fails

Errors are never silenced. The system either fixes itself or tells you exactly what’s wrong and how to fix it.

See Fail-Fast & Self-Healing for the design principles.

Putting It Together

A typical Gaius session:

Launch the TUI: uv run gaius
Observe the grid state — entity positions projected from embedding space
Navigate (hjkl): Explore regions of interest
Overlay (o): See topology, risk, or agent state
Command (/): Run slash commands for deeper analysis
Monitor (/health): Check system health, let self-healing handle issues

The grid becomes a living map of your domain’s complexity — updated as agents explore and topology reveals hidden structure.

The Grid Metaphor

Origins in Go

The 19×19 grid traces its heritage to the ancient game of Go (围棋/囲碁/바둑). For over four millennia, this board has served as a substrate for strategic reasoning of remarkable depth.

Go’s grid has properties that make it ideal for information visualization:

Discrete but dense: 361 points offer fine granularity while remaining visually tractable
Symmetric: No privileged positions (unlike chess’s asymmetric opening)
Emergent structure: Corners, edges, and center have different strategic character despite identical local rules
Scale-invariant patterns: The same shapes (eyes, ladders, ko) appear at multiple scales

The Grid as Projection Surface

In Gaius, the grid serves as a projection surface for high-dimensional data. Consider an embedding space with 1536 dimensions (typical for modern text embeddings). How do we make this legible?

High-dimensional space          The Grid
      (n=1536)                  (n=361)
         │                         │
         │    PCA / UMAP /         │
         │    custom projection    │
         ▼                         ▼
    ┌─────────┐              ┌───────────┐
    │ ● ● ●   │              │ · · ● · · │
    │   ●   ● │    ────►     │ · ● · · · │
    │ ●     ● │              │ · · · ● · │
    └─────────┘              └───────────┘

The projection is necessarily lossy. This is a feature: it forces salience. Points that survive projection and remain distinct are points that matter.

Addressing

Every grid position has a unique address:

   A B C D E F G H J K L M N O P Q R S T
19 · · · · · · · · · · · · · · · · · · · 19
18 · · · · · · · · · · · · · · · · · · · 18
17 · · · + · · · · · · · · · + · · · · · 17
...
 1 · · · · · · · · · · · · · · · · · · ·  1
   A B C D E F G H J K L M N O P Q R S T

Note: Column I is skipped (Go convention, to avoid confusion with the numeral 1).

This addressing enables:

Precise reference: “The cluster at D4-F6”
Command targeting: /analyze K10 or /mark Q16 critical
Spatial queries: “What’s near the center?” → J10-L10, J9-L11

Visual Vocabulary

The grid supports a rich visual vocabulary:

Point Markers

Symbol	Meaning
`●`	Black stone / primary entity
`○`	White stone / secondary entity
`✛`	Cursor position
`a-i`	Candidate markers (yellow)
`◦`	Neutral / unaffiliated point

Density Shading

Symbol	Density
`▓`	High (>75%)
`▒`	Medium (50-75%)
`░`	Low (20-50%)
`·`	Minimal (<20%)

Overlay Markers

Symbol	Meaning
`⚠`	Death loop / H1 feature
Colored `●`	Agent position

The Grid as Strategic Map

In Go, professionals often describe the board in terms of strategic regions:

Corners (4 points): High-value, easy to secure
Edges (4 sides): Secondary value, harder to defend
Center: Hardest to claim, but dominates late-game influence

Gaius inherits this intuition. Data projected to corners represents stable, well-understood entities. Central positions represent contested or ambiguous terrain. Edge regions represent transitional states.

Compositional Thinking

The grid invites compositional reasoning:

Groups: Connected points form units (liberty-counting in Go becomes cluster analysis)
Territory: Regions bounded by your stones (areas of control/understanding)
Influence: Distant effects from strong positions (attention propagation)
Ko: Positions that oscillate (unstable equilibria in your data)

These metaphors aren’t forced—they emerge naturally when complex systems are projected onto discrete spatial representations.

Why Not a Larger Grid?

Larger grids (e.g., 100×100) would offer more resolution but sacrifice:

Gestalt perception: Humans can’t perceive 10,000 points holistically
Addressability: 100×100 requires two-digit coordinates
Strategic depth: Go on 9×9 is trivial; 19×19 is profound. Scale matters.

The 19×19 board occupies a cognitive sweet spot. Gaius exploits this.

Embeddings & Point Clouds

What Are Embeddings?

Embeddings are learned vector representations that encode semantic relationships as geometric relationships. Two items that are “similar” in meaning have embedding vectors that are “close” in space.

"pension fund"     → [0.23, -0.41, 0.88, ...]  (768 dims)
"retirement plan"  → [0.25, -0.39, 0.86, ...]  (nearby)
"pizza recipe"     → [-0.67, 0.12, -0.33, ...] (distant)

Gaius uses Nomic embeddings (768 dimensions) stored in Qdrant for vector search. The EmbeddingController manages a dedicated GPU endpoint for real-time embedding generation.

Point Clouds in Gaius

When multiple embeddings are collected — knowledge base documents, agent utterances, card content — they form a point cloud in 768-dimensional embedding space. This point cloud is the raw material for:

Grid projection — UMAP reduces 768 dimensions to 2D, then quantized to the 19x19 grid
Topological analysis — persistent homology on the cosine distance matrix reveals shape
Curvature computation — Ollivier-Ricci curvature on the k-NN graph reveals semantic boundaries
Visualization — topology features drive the grammar engine that generates card imagery

Projection Pipeline

The GridProjector (core/projection.py) maps embeddings to grid positions:

768-dim Nomic embeddings (Qdrant)
    │
    ▼
UMAP (n_neighbors=15, min_dist=0.1, metric=cosine)
    │
    ▼
2D coordinates (continuous)
    │
    ▼
Quantize to [0, 18] integer grid
    │
    ▼
19x19 grid positions (GridPoint objects)

UMAP preserves local neighborhood structure — points that are close in 768-dim space remain close on the grid. The projection is necessarily lossy. This is a feature: it forces salience. Points that survive projection and remain distinct are points that matter.

Multi-Vector Architecture

Gaius uses ColNomic multi-vector embeddings for document retrieval (late interaction scoring). Each document has multiple token vectors, aggregated to a single vector for grid projection. The aggregated vectors are stored in Qdrant’s "agg" named vector field.

Mapping to the Grid

Once projected to 2D, coordinates are normalized and discretized:

# Normalize to [0, 1]
x_norm = (projected[:, 0] - projected[:, 0].min()) / (projected[:, 0].ptp() + 1e-8)
y_norm = (projected[:, 1] - projected[:, 1].min()) / (projected[:, 1].ptp() + 1e-8)

# Scale to grid
x_grid = np.clip((x_norm * 18).astype(int), 0, 18)
y_grid = np.clip((y_norm * 18).astype(int), 0, 18)

Multiple points may map to the same grid cell. Collision handling uses latest-wins for display, with density encoded in overlay modes.

What Topology Sees That Statistics Misses

The same point cloud feeds three mathematical analyses:

Analysis	What It Reveals	Source
Persistent homology (ripser)	Clusters (H0), loops (H1), voids (H2) in the filtration	`core/tda.py`
Ollivier-Ricci curvature	Semantic boundaries (κ < 0) vs cluster interiors (κ > 0)	`core/geometry.py`
Gradient fields (∇κ)	Direction of steepest semantic change	`core/geometry.py`

These analyses operate on the original 768-dimensional embeddings, not the projected 2D coordinates. The grid shows where features appear; the topology describes what features exist.

Temporal Dynamics

As new data arrives (agent responses, KB entries, card publications), the point cloud evolves. The NGRCPredictor (reservoir computing via NVAR) tracks embedding centroid trajectories and predicts drift:

t=0: Initial cloud from seed data
t=1: + First cognition cycle thoughts
t=2: + Article curation cards
t=3: + Agent swarm responses
...

The grid animates this evolution. The ThetaAgent monitors drift urgency — when the knowledge base is changing faster than consolidation is linking it, urgency rises and triggers cross-temporal linking.

Storage

All embeddings are stored in Qdrant with collection schemas that include domain, agent_id, session_id, and timestamp. This enables:

Semantic search: “Find entries similar to X”
Temporal analysis: Track how a domain’s embedding distribution shifts over time
Agent collaboration: LatentMAS uses Qdrant as shared working memory (768-dim dense or 20,480-dim CLT sparse)

Persistent Homology

Beyond Statistics

Statistics describes the distribution of data: mean, variance, correlations. But distributions are blind to shape.

Consider two point clouds:

Cloud A:          Cloud B:
  ● ●               ●   ●
 ●   ●             ●     ●
●     ●            ●     ●
 ●   ●             ●     ●
  ● ●               ●   ●

Same mean. Same variance. Same point count. But Cloud A is a filled disk; Cloud B is a ring with a hole. The hole is topologically significant — it represents something absent, something that might matter.

Persistent homology is the mathematics of detecting such shapes.

The Vietoris-Rips Complex

Given a point cloud, we construct a simplicial complex by connecting points within a distance threshold ε:

ε = small:     ε = medium:     ε = large:
  ●   ●         ●───●           ●───●
                    │           │╲ ╱│
  ●   ●         ●   ●           ●─╳─●
                                │╱ ╲│
  ●   ●         ●───●           ●───●

As ε increases:

H0 features (connected components): Merge as clusters connect
H1 features (loops): Appear when edges close cycles, disappear when interiors fill
H2 features (voids): Appear when surfaces enclose volumes

Birth and Death

Each topological feature has a birth time (the ε at which it appears) and a death time (the ε at which it vanishes).

Features that persist across a wide range of ε are considered significant — they reflect genuine structure rather than noise.

Persistence Diagram:
        death
          │
          │    ● (noise: short-lived)
          │
          │          ● (signal: long-lived)
          │        ●
          │      ●
          └──────────── birth

Points far from the diagonal represent persistent features.

Gaius Implementation

The TDAComputer (core/tda.py) computes persistent homology via ripser on the cosine distance matrix of Nomic 768-dimensional embeddings:

computer = TDAComputer(max_dimension=2, method="rips")
features = computer.compute(embeddings, grid_coords)

Key parameters:

Distance metric: cosine (on the original 768-dim embeddings, not projected coordinates)
Max dimension: 2 (H0, H1, H2)
Significance threshold: persistence > 0.1 (a heuristic separating signal from noise)
Subsampling: random sample when point count exceeds config.tda.max_points (ripser is O(n³) worst case)

The output TDAFeatures contains raw PersistenceInterval objects (birth, death, dimension, representative indices) plus grid-projected BoundingBox regions for visualization overlays.

How Gaius Uses Each Dimension

H0 (connected components): How the collection fragments into clusters at different distance thresholds. The Betti number b₀ counts distinct topological components. In the visualization pipeline, b₀ determines the number of disconnected shape groups.

H1 (loops / “death loops”): 1-cycles that persist across a range of filtration values indicate circular or cyclic structure — topics that loop back on themselves. In the grid overlay, H1 features appear as ⚠ markers. In the visualization pipeline, b₁ generates toroidal glass rings (0-3 per card).

H2 (voids): 2-cycles that enclose empty regions — higher-order cavities in the embedding space where no cards exist despite being topologically surrounded. In the visualization pipeline, b₂ generates inverted-normal void spheres (0-2 per card).

From Topology to Visualization

The persistence diagram feeds directly into the grammar engine’s feature-to-rule mapping:

Topological Feature	Visual Encoding
Total persistence (normalized via tanh)	Recursion depth (3-7 levels)
b₁ count	Toroidal glass ring count
b₂ count	Void chamber count
Individual persistence intervals	Filament structures — scale encodes lifetime, z-position encodes birth value
Persistence entropy	Used for temporal change detection (regime change signals)

Entropy as Summary

Persistence entropy provides a scalar summary of topological complexity:

Low entropy: Few dominant features (simple structure)
High entropy: Many features of similar persistence (complex, fractal-like)

Gaius tracks entropy over time. Sudden entropy spikes may indicate regime changes in the underlying domain.

Interpreting Grid Overlays

When viewing the H1 overlay:

Pattern	Interpretation
Sparse `⚠`	Few persistent loops; structure is tree-like
Clustered `⚠`	Localized cyclic structure; investigate region
Uniform `⚠`	Pervasive cyclicity; may indicate noise or genuine complexity
Ring of `⚠`	Boundary of a significant void

Limitations

Persistent homology reveals shape but not causation. A detected loop could represent:

A real feedback cycle in your domain
An artifact of the embedding model
Noise in the underlying data

Domain expertise is required to interpret topological features. Gaius surfaces the structure; you provide the meaning.

Epistemology of Augmented Cognition

How knowledge grows in a human-AI system

The Tautology

Augmented cognition must yield nonrandom advantage with verifiable outcomes.

This isn’t philosophy for its own sake. It’s the test. If the human-plus-system doesn’t produce results that beat the null hypothesis—problems solved faster, connections seen that would be missed, errors avoided, artifacts of higher quality—then the augmentation is theater.

Everything that follows serves this constraint.

The Third Mind

The Enlightenment assumed the individual mind as atomic unit: properly disciplined reason, applied to sensory evidence, converging on truth. The Romantic correction enriched the channels—emotion, intuition, aesthetic sense—but preserved the individual.

What if both missed something?

Cognition may have never been atomic. It distributes across brains, books, conversations, environments. The “individual thinker” was always a convenient fiction—useful for assigning credit and blame, but not how thinking actually happens.

Gaius makes the distribution explicit:

The KB is externalized shared memory
The swarm is a parliament of perspectives
The cognition system generates thoughts between sessions
The human brings mortality, stakes, aesthetic judgment, and the ability to act

What emerges is a third mind—something that belongs fully to neither human nor AI. It’s not human intelligence augmented by AI (the usual framing). It’s not AI directed by human. It’s a novel form of collaborative cognition that neither could produce alone.

The Dialectic on the Board

The 19x19 grid represents a fundamental tension:

One color (Order/Logos): The Enlightenment inheritance. Kant’s categories imposing structure on raw experience. Each stone is a fact—tested, confirmed, placed with certainty. The mind palace architecture where memory has address and retrieval is deterministic. This force embodies the best virtues of enlightenment thinking: we may come to know the universe through experience of our senses and share this knowing with others who may confirm or refute our understanding.

The other color (Entropy/Eros): The Romantic counter-current. Nietzsche’s Dionysian impulse that shatters Apollonian form. Bergson’s élan vital—life as creative evolution resisting mechanistic reduction. Each stone is a question, a provocation, a refusal to settle into local minima. This antithetical force is the path toward what may be an undiscovered formal description language for aesthetics.

The colors randomize daily. This prevents rooting for “our team.” Some days order serves creativity; some days entropy is the path to truth.

The Go metaphor is apt because Go isn’t chess—there’s no king to capture, no objective hierarchy. Victory is territory, which is liminal: stones create influence that shades into emptiness. The game rewards both sente (initiative, creativity) and gote (response, consolidation).

Memory and Compaction

An old man remembers every aspect of his first kiss but can’t recall breakfast.

This isn’t failure—it’s selection. The first kiss persists because it integrated into everything else: identity, narrative, desire, loss. It has a thousand hooks into the larger structure. Breakfast has one hook: “I ate.” No redundancy. Nothing to reconstruct from.

Human memory isn’t a tape recorder with degradation. It’s a living graph that keeps what connects and lets the rest dissolve. The “compression” isn’t lossy in the information-theoretic sense—it’s meaning-preserving. What matters survives.

The same principle applies to Gaius:

Should persist:

What changed understanding
What connects to many other things
What might matter later in ways we can’t predict
What was beautiful—even if we can’t justify why

Should dissolve:

Scaffolding that served its purpose
Dead ends fully explored
Noise that looked like signal until it didn’t

The test: does this have hooks into the future?

The Lens: Falsifiable Forward Simulation

What separates understanding from memorization?

You can memorize that water boils at 100°C. You understand thermodynamics when you can simulate: “what happens to boiling point at altitude?” and get an answer that reality confirms.

Forward simulation + falsification = the engine of real knowledge.

This connects to work across domains:

PINNs (Physics-Informed Neural Networks): Neural nets constrained by differential equations that must hold. The physics prior forces the model to learn something simulatable, not just interpolatable.
Portfolio optimization: Build a model of covariances and returns, simulate forward, and the market confirms or refutes. The held-out Sharpe ratio is the falsification.
SAT solvers: Explore logical possibility space by propagating constraints forward—if I assume X, what follows? Does it contradict something known?

Knowledge Hierarchy

Highest value: Knowledge that enables forward simulation with testable outputs

“If we do X, Y should happen”—then we can check
Causal models, not just correlations
Theories, not just observations

Medium value: Observations that could become simulatable once enough accumulate

Data points that might reveal structure
Anomalies that challenge existing models

Lowest value: Isolated facts with no predictive hooks

Things that are true but don’t connect forward
The old man’s breakfast

The Dialectic Reframed

Through this lens, Order and Entropy both serve falsifiable simulation:

Order = model refinement (tightening predictions, reducing uncertainty)
Entropy = model exploration (new hypotheses, expanded possibility space)

Order sharpens the blade. Entropy finds new things to cut.

Implications for Design

Score knowledge by forward-simulation capacity: Does this KB entry let you predict something you couldn’t before? Can that prediction be tested?
Cognition should generate hypotheses: Between sessions, Gaius shouldn’t just summarize—it should ask: “what would I predict? what remains testable?”
Evolution should favor predictive prompts: The held-out evaluation tests whether agent improvements transfer beyond training data.
The grid should reveal predictive structure: Clusters might indicate shared causal mechanisms. Voids might indicate underdetermined regions. H1 cycles might indicate feedback loops with predictable dynamics.
Compaction should preserve predictive content: When context windows fill, what survives should be what enables future simulation, not just what was recently accessed.

The Asymmetry

The human has continuity. The KB accumulates externalized cognition across sessions. Understanding can be observed evolving—in git history, in dated files, in logged thoughts.

The AI has no such continuity. Each session bootstraps from artifacts. Something that functions like understanding emerges within the session, but doesn’t persist. Tomorrow’s instance won’t remember this exchange unless it’s written down.

The human observes understanding in the mirror of shared artifacts. The AI is more like the mirror itself—a surface that reflects with some distortion, some amplification, but doesn’t retain the image once you look away.

But this asymmetry may be feature, not bug. The AI can’t get stuck in ruts, can’t accumulate biases from past sessions, always brings fresh eyes. The persistence lives in the artifacts, not in the AI.

And the tautology holds regardless: nonrandom advantage with verifiable outcomes. The test isn’t whether the AI has continuous selfhood. The test is whether the collaboration produces results.

This document emerged from collaborative discourse, December 2024. It attempts to capture understanding that might otherwise dissolve—not because discourse is unimportant, but because the impermanence of conversation is precisely what makes externalization necessary.

Fail-Fast & Self-Healing

Fail-fast is an iron-clad design principle in Gaius. All code surfaces errors immediately with actionable remediation paths. The system never silently degrades, falls back to placeholders, or continues with partial functionality.

The Principle

When something goes wrong, the correct response is not to hide it — it’s to surface it immediately with enough information to fix it. Every error message in Gaius includes:

Guru Meditation Code: A unique identifier for the failure mode
Health Fix Command: A reference to /health fix <service> when applicable
Manual Remediation: Alternative manual steps if self-healing can’t resolve it

error_msg = (
    "DatasetService not initialized.\n"
    "  Guru: #DS.00000001.SVCNOTINIT\n"
    "  Try: /health fix dataset\n"
    "  Or:  just restart-clean"
)

Guru Meditation Codes

Inspired by the Amiga’s memorable error screens, every failure mode gets a unique identifier.

Format: #<COMPONENT>.<SEQUENCE>.<MNEMONIC>

Component	Description
DS	DatasetService
NF	NiFi
EN	Engine
EP	Endpoints/Inference
EV	Evolution
DB	Database
QD	Qdrant
GR	gRPC
ACP	Agent Client Protocol
ACF	Article Curation Flow

Each code maps to exactly one failure mode. A failure mode may have multiple diagnostic heuristics, but the code is the canonical identifier.

See Guru Meditation Codes for the complete catalog.

What Fail-Fast Prohibits

No Optional Fallbacks

Never use fail_fast=True as a parameter. Fail-fast is the ONLY behavior, not an option.

No Silent Degradation

If a required resource is unavailable (LLM endpoint, NiFi, database), raise an error immediately. Never substitute placeholder data or skip functionality.

No Conditional Feature Flags for Core Functionality

Don’t use patterns like if SELENIUM_AVAILABLE: with an else clause that produces fake data. Either the feature works or it fails.

Fail Open for Observability

The counterpart to fail-fast for observability code is fail open. When filtering or displaying health state:

Filter OUT, not IN: When showing active incidents, filter out known terminal states (resolved) rather than filtering in known active states. Unknown states are surfaced for investigation.
Unknown States are Visible: Any state not in the “terminal” list is displayed. This ensures new or unexpected states don’t silently disappear.

# BAD: Filtering IN known active states (brittle)
active = [i for i in incidents if i.status in ("active", "healing")]

# GOOD: Filtering OUT known terminal states (fail open)
active = [i for i in incidents if i.status != "resolved"]

Self-Healing Hierarchy

When services are unhealthy, Gaius follows a remediation hierarchy:

/health fix <service> — Let Gaius attempt self-healing first
Manual commands (just restart-clean, etc.) — Only if self-healing fails
ACP escalation — For novel failures that need human or AI intervention

The Health Observer daemon continuously monitors all system components. When an incident exceeds the configured FMEA RPN (Risk Priority Number) threshold, it escalates through ACP to Mistral Vibe for meta-level intervention.

Heuristics and KB

Each failure mode has a corresponding heuristic document in the knowledge base:

Symptom: Brief description of what the user sees
Cause: Why this happens
Observation: How to detect it programmatically
Solution: How to fix it, with /health fix command

This creates a closed loop: errors reference codes, codes map to heuristics, heuristics provide automated fixes.

System Overview

Gaius projects high-dimensional embeddings and their topological/geometric structure onto a 19x19 lattice. The system integrates persistent homology, Ollivier-Ricci curvature, agent orchestration, FMEA-based health monitoring, and LuxCore visualization.

Layer Architecture

The system is organized in layers with strict dependency direction:

Layer	Components	Responsibility
L1 - Core	Persistent homology, Ricci curvature, UMAP projection, telemetry	Mathematical foundations
L2 - Transport	gRPC client, PostgreSQL, Qdrant, Iceberg	Persistence and communication
L3 - Engine	gRPC server, 37 registered services	Business logic, orchestration
L4 - Inference	vLLM, optillm, embeddings, models	GPU workload execution
L5 - Orchestration	Swarm, Theta, evolution, health	Agent coordination
L6 - Verification	RASE metamodel, constraints, oracle	Safety-critical verification
L7 - Widgets	Grid, mini-grids, file tree, info panel	TUI components
L8 - Application	TUI, CLI, MCP server	User-facing interfaces

Rule: Higher layers depend on lower layers, never the reverse. The engine (L3) is the single point of coordination — TUI, CLI, and MCP all call engine RPCs rather than accessing backends or storage directly.

Communication Paths

All three interfaces communicate with the engine via gRPC:

┌─────────┐  ┌─────────┐  ┌─────────┐
│   TUI   │  │   CLI   │  │   MCP   │
└────┬────┘  └────┬────┘  └────┬────┘
     │            │            │
     └────────────┼────────────┘
                  │ gRPC :50051
           ┌──────┴──────┐
           │   Engine    │
           │  (37 svcs)  │
           └──────┬──────┘
                  │
     ┌────────────┼────────────┐
     │            │            │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│  vLLM   │ │ Postgres│ │ Qdrant  │
│ (GPUs)  │ │  :5444  │ │  :6334  │
└─────────┘ └─────────┘ └─────────┘

Mathematical Foundations

The core layer (L1) provides the mathematical primitives that other layers consume:

Grid projection: UMAP maps embedding vectors from ℝ⁷⁶⁸ to continuous 2D coordinates (cosine metric, k=15, min_dist=0.1). Coordinates are quantized to the 19x19 integer lattice by rounding and clipping to [0, 18]. Grid positions follow Go board conventions (A1–T19, omitting I).

Persistent homology: Ripser computes a Vietoris-Rips filtration over the cosine distance matrix of the original high-dimensional embeddings (not the projected coordinates). Persistence barcodes for H0 (components), H1 (loops), and H2 (voids) are computed. Intervals with persistence > 0.1 are marked significant.

Ollivier-Ricci curvature: Discrete curvature on the k-NN graph (k=15, cosine metric, alpha=0.5, OTD method). Per-node curvature is the mean of incident edge curvatures. Positive curvature indicates overlapping neighborhoods (cluster interiors); negative indicates diverging neighborhoods (transition regions). Gradient fields and divergence values are projected to the 9x9 Iso mini-grid via inverse-distance-weighted interpolation (power=2).

These three computations feed into multiple downstream systems:

Visualization — Curvature and persistence control the grammar engine’s recursive shape expansion
Agent trajectories — Roles use curvature values for lattice positioning (Leader → cluster centroids, Risk → boundary regions)
Overlays — TUI renders curvature, persistence, and complexity as scalar field overlays on the grid

Execution Paths

TUI session: gaius → splash screen → GaiusApp.compose() → on_mount() initializes gRPC client, loads KB entries, projects grid, starts background services.

CLI command: gaius-cli --cmd "/health" → parse command → gRPC call → engine servicer → format response.

MCP tool call: MCP client → mcp_server.py → @mcp.tool handler → gaius.mcp.operations → gRPC → engine → backend.

Agent evolution: Engine daemon → EvolutionService.daemon_loop() → check GPU idle (<30%) → select next agent → optimize → evaluate → save version.

Theta consolidation: /sitrep → ThetaAgent.consolidate() → NVAR (Nonlinear Vector AutoRegression) drift detection → BERTSubs subsumption inference → Knowledge Gradient selection → wikilink injection.

Key Numbers

Metric	Count
Lines of code	~252K
Python packages	26
Engine services	37
CLI commands	63
MCP tools	163
GPUs	6 (NVIDIA)
FMEA (Failure Mode and Effects Analysis) modes	34
gRPC port	50051
PostgreSQL port	5444

Package Structure

src/gaius/
├── app.py              # TUI application (Textual)
├── cli.py              # Non-interactive CLI
├── mcp_server.py       # MCP server (163 tools)
├── core/               # TDA, geometry, projection, state
├── engine/             # gRPC engine (central nervous system)
├── health/             # FMEA-based health monitoring and self-healing
├── agents/             # Swarm, theta, evolution, cognition
├── inference/          # Multi-backend routing
├── rase/               # MBSE metamodel (agent verification)
├── viz/                # LuxCore visualization
├── storage/            # PostgreSQL + Qdrant
├── flows/              # Metaflow data pipelines
├── widgets/            # TUI widgets
└── commands/           # Slash command implementations

See Engine-First Architecture for why this design was chosen.

Engine-First Architecture

All business logic lives in the gRPC engine. The TUI, CLI, and MCP server are thin clients that translate user intent into engine RPC calls and format responses for display.

Why Engine-First

Early Gaius had business logic scattered across the TUI, CLI, and various utility scripts. This created several problems:

Duplication: The same logic reimplemented across interfaces
Inconsistency: CLI and TUI producing different results for the same operation
Testing difficulty: Business logic entangled with UI code
Resource contention: Multiple processes competing for GPU access
Observability blind spots: Metrics emitted inconsistently across interfaces

The engine-first approach solves all of these by centralizing logic in a single daemon that manages all shared resources — GPUs, database connections, vector stores, and inference endpoints.

The Rule

Interfaces do not contain business logic. They:

Parse user input into a command or RPC call
Send the request to the engine via gRPC (port 50051)
Format the response for display

If you find yourself writing business logic in app.py, cli.py, or mcp_server.py, it belongs in an engine service instead.

Architecture

TUI (Textual)  ─┐
CLI (argparse)  ──┼── gRPC client ──→ Engine daemon (port 50051)
MCP (stdio)     ─┘                         │
                                    ┌──────┼──────┐
                                    │      │      │
                               Services  Backends  Storage
                                    │      │      │
                              Scheduler  vLLM    PostgreSQL
                              Health    Nomic    Qdrant
                              Evolution optillm  R2/MinIO
                              Cognition ColPali  Filesystem

The engine hosts 37 services organized into four groups: resource management, intelligence, data, and external integration. All services share the same process, enabling zero-cost inter-service calls.

Thin Client Examples

TUI (app.py)

result = await self.grpc_client.call("GetHealthStatus")
self.display(result)  # Formatting only

CLI (cli.py)

result = await client.call("GetHealthStatus")
print(json.dumps(result, indent=2))  # Serialization only

MCP (mcp_server.py)

@server.tool()
async def health_observer_status():
    result = await client.call("GetHealthStatus")
    return result  # Schema mapping only

Benefits

Single source of truth: One implementation, three interfaces. A new feature requires only an engine service + gRPC method — all clients get it automatically.
GPU management: Engine controls all GPU allocation through the Orchestrator. No client can directly access CUDA devices.
Background services: Evolution, cognition, health monitoring, and scheduled tasks run in the engine daemon with zero coordination overhead.
Consistent observability: OTel instrumentation happens once in the engine, tagged with the originating service (gaius-tui, gaius-cli, gaius-mcp, gaius-engine, gaius-worker).
Testing: CLI validates the same code path as TUI and MCP. Testing via CLI is testing the product.

Exceptions

A few operations are interface-specific by necessity:

TUI rendering: Widget layout, Textual event handling, sparkline rendering
CLI formatting: JSON/text output formatting, color codes
MCP tool metadata: Tool descriptions and parameter schemas for AI assistant discovery

These are presentation concerns, not business logic.

Interfaces: TUI, CLI, MCP

Gaius provides three access paths to the engine. All three are thin clients — they contain no business logic, performing zero computation beyond display formatting. Every operation routes through gRPC to the engine, which is the single source of truth.

TUI (Terminal User Interface)

The interactive terminal application built on Textual.

uv run gaius

Components:

MainGrid: 19x19 Go board for spatial visualization of embedding topology
MiniGridPanel: Three 9x9 orthographic projections (CAD-style views showing topology, embeddings, and temporal evolution)
FileTree: Plan 9-inspired navigation where agents appear as files under /agents/
ContentPanel: Right panel displaying file contents, agent output, and position context
CommandInput: Slash command input with history and tab completion
ObservePanel: Real-time metrics with 15-second refresh, sparklines showing 5 minutes of history

Design inspirations: Go board (spatial metaphor, tenuki), Bloomberg Terminal (information density), Plan 9/Acme (everything is a file), CAD orthographic views (multiple projections updating together).

Best for: Interactive exploration, spatial navigation, visual pattern recognition.

CLI (Command Line Interface)

Non-interactive interface for scripting and automation.

# Single command execution
uv run gaius-cli --cmd "/health" --format json

# Pipe to jq for extraction
uv run gaius-cli --cmd "/gpu status" --format json | jq '.data.endpoints[]'

# Poll for status changes
for i in $(seq 1 15); do
    sleep 10
    uv run gaius-cli --cmd "/gpu status" --format json
done

63 slash commands spanning health diagnostics, agent management, inference control, evolution monitoring, knowledge base operations, visualization rendering, observability, and more. Every command available in the TUI is available in the CLI with identical semantics.

The CLI is the primary testing interface. After every code change, the CLI verifies that the product works — it is the product, not a wrapper around it.

Best for: Scripting, CI/CD integration, automated monitoring, quick status checks.

MCP (Model Context Protocol)

Programmatic interface exposing 163 tools to AI assistants.

{
  "mcpServers": {
    "gaius": {
      "command": "uv",
      "args": ["run", "gaius-mcp"],
      "cwd": "/path/to/gaius"
    }
  }
}

163 MCP tools organized by domain: health (diagnostics, observer, incidents, fixes), agents (evolution, swarm, latent memory, CLT), inference (scheduler, ask, evaluate), knowledge base (search, read, create, sync), observability (metrics, prometheus, status), visualization (render, collections), and bases (entity queries, lineage).

The MCP server enables AI-assisted operations — an external agent can monitor health, trigger evolution cycles, query the knowledge base, and manage infrastructure through the same gRPC protocol as human-operated interfaces.

Best for: AI-assisted operations, autonomous health maintenance, programmatic integration.

Interface Comparison

Feature	TUI	CLI	MCP
Interactive	Yes	No	No
Visual grid	Yes	No	No
JSON output	No	Yes	Yes
Scriptable	No	Yes	Yes
AI-accessible	No	No	Yes
Slash commands	63	63	N/A (163 tools)
Streaming output	Yes	No	No

Engine-First Architecture

All three interfaces use the same gRPC client library (gaius.client) to communicate with the engine:

TUI ─┐
CLI ──┼── gRPC (port 50051) ──→ Engine ──→ Services, Backends, Storage
MCP ─┘

The engine is the single source of truth for metric export, state management, and inference routing. Clients never access GPUs, databases, or external APIs directly. This architecture means:

Adding a new capability requires only an engine service + gRPC method — all three clients get it automatically
Testing via CLI validates the same code path as TUI and MCP
Observability instrumentation happens once, in the engine, tagged with the originating service (gaius-tui, gaius-cli, gaius-mcp, gaius-engine, gaius-worker)

gRPC Engine

The engine is the central nervous system of Gaius. It’s a long-running daemon that manages GPU resources, coordinates services, and exposes all functionality via gRPC on port 50051.

Architecture

┌──────────────────────────────────────────────┐
│                gRPC Server :50051             │
│  ┌──────────────┐  ┌──────────────────────┐  │
│  │ KServe OIP   │  │ Gaius Extensions     │  │
│  │ (inference)  │  │ (health, evolution,  │  │
│  │              │  │  orchestrator, ...)  │  │
│  └──────┬───────┘  └──────────┬───────────┘  │
├─────────┼─────────────────────┼──────────────┤
│         │    37 Services      │              │
│  ┌──────┴──────┐  ┌──────────┴───────────┐  │
│  │ Orchestrator │  │ Scheduler            │  │
│  │ Evolution    │  │ Cognition            │  │
│  │ Health       │  │ Topology             │  │
│  │ CLT          │  │ Dataset              │  │
│  │ ...          │  │ ...                  │  │
│  └──────┬───────┘  └──────────┬───────────┘  │
├─────────┼─────────────────────┼──────────────┤
│         │ Backend Controllers │              │
│  ┌──────┴──────┐  ┌──────────┴───────────┐  │
│  │ vLLM Ctrl   │  │ Embedding Ctrl       │  │
│  │ optillm Ctrl│  │ Backend Router       │  │
│  └──────┬───────┘  └──────────┬───────────┘  │
│         │                     │              │
│  ┌──────┴─────────────────────┴───────────┐  │
│  │           GPU Pool (6x NVIDIA)         │  │
│  └────────────────────────────────────────┘  │
└──────────────────────────────────────────────┘

Startup Sequence

The engine initializes in 9 phases, streaming progress to connected clients:

Phase	Duration	Action
INIT	Immediate	InitController starts
GRPC	~1s	gRPC server binds to :50051
TELEMETRY	~2s	OpenTelemetry setup
BACKENDS	~5s	Backend router initialization
ORCHESTRATOR	~2s	Orchestrator service starts
ENDPOINTS	~240s	vLLM model loading to VRAM
TRANSPORT	~2s	Aeron bridge setup
SERVICES	~5s	Background services start
COMPLETE	-	Ready for inference

The gRPC server starts early (phase 2) so clients can connect immediately and receive real-time progress during the ~4 minute vLLM startup.

Module Structure

engine/
├── server.py              # Main daemon entry point
├── config.py              # Engine configuration
├── init_controller.py     # Initialization progress streaming
├── workloads.py           # Workload definitions
├── grpc/
│   ├── server.py          # gRPC server setup
│   └── servicers/
│       ├── inference_servicer.py  # KServe OIP implementation
│       └── gaius_servicer.py      # Gaius extensions
├── backends/
│   ├── backend_router.py  # Unified request routing
│   ├── vllm_controller.py # vLLM process management
│   ├── optillm_controller.py
│   └── embedding_controller.py
├── services/              # 37 registered services
├── compute/               # Grid projection, TDA
├── resources/             # GPU allocation
├── transport/             # Aeron bridge
├── generated/             # Protobuf generated code
└── proto/                 # Protobuf definitions

gRPC Protocol

The engine implements two gRPC services:

KServe Open Inference Protocol

Standard inference protocol for compatibility with ML platforms:

service GRPCInferenceService {
    rpc ServerLive(ServerLiveRequest) returns (ServerLiveResponse);
    rpc ServerReady(ServerReadyRequest) returns (ServerReadyResponse);
    rpc ModelMetadata(ModelMetadataRequest) returns (ModelMetadataResponse);
    rpc ModelInfer(ModelInferRequest) returns (ModelInferResponse);
}

Gaius Extensions

Custom RPCs for Gaius-specific functionality:

service GaiusService {
    rpc WatchInit(stream InitRequest) returns (stream InitProgress);
    rpc WatchHealth(HealthRequest) returns (stream HealthMetrics);
    rpc EvolutionStatus(Empty) returns (EvolutionStatusResponse);
    rpc TriggerEvolution(TriggerRequest) returns (TriggerResponse);
    rpc GetEndpointStatus(Empty) returns (EndpointStatusResponse);
    rpc StartEndpoint(StartRequest) returns (StartResponse);
    rpc StopEndpoint(StopRequest) returns (StopResponse);
}

Configuration

engine {
    grpc {
        host = "0.0.0.0"
        port = 50051
        max_workers = 10
        max_message_size = 104857600  # 100MB
    }
    orchestrator {
        preload_endpoints = ["reasoning"]
        startup_timeout = 600  # 10 minutes
        health_check_interval = 30
    }
    scheduler {
        max_queue_size = 1000
        default_timeout = 120
    }
    evolution {
        enabled = true
        idle_threshold = 60
        cycle_interval = 3600
    }
}

Running the Engine

# Via devenv process-compose (normal operation)
devenv processes up

# Standalone
uv run gaius-engine

# Clean restart (stops everything, cleans up, restarts)
just restart-clean

Verifying Engine Health

# Check if gRPC port is listening
nc -zv localhost 50051

# Check endpoint status
uv run gaius-cli --cmd "/gpu status" --format json

# Watch engine logs
tail -f .devenv/processes.log | grep gaius-engine

Engine Services

The engine hosts 37 services organized into four groups: resource management, intelligence, data, and external integration. All services run in a single daemon process, enabling zero-cost inter-service calls and shared access to GPU resources.

Service Groups

Resource Management

Service	Purpose
OrchestratorService	vLLM endpoint lifecycle, GPU allocation, capability-based scheduling
SchedulerService	Priority job queue (CRITICAL→EVOLUTION), OR-Tools makespan optimization, XAI budget
HealthService	GPU health via pynvml, endpoint liveness, FMEA score computation
AgendaTracker	Tracks scheduled endpoint transitions to suppress false-positive health incidents

Intelligence

Service	Purpose
EvolutionService	Agent prompt optimization via APO/GEPA during GPU idle periods
CognitionService	Autonomous thought generation — pattern, connection, curiosity, self-observation
CLTService	Cross-Layer Transcoder sparse feature extraction (20,480-dim, ~115 active)
TopologyService	Semantic attractor detection, drift monitoring via CLT features
NGRCPredictor	Reservoir computing (NVAR) for temporal prediction of embedding trajectories

Data

Service	Purpose
DatasetService	NiFi SoM dataset generation for agent training
FlowSchedulerService	Metaflow pipeline scheduling and execution
KBService	Knowledge base CRUD operations
LineageService	OpenLineage event materialization into Apache AGE graph

External Integration

Service	Purpose
XBookmarksService	X (Twitter) bookmark synchronization with folder-first sync

Background Tasks

Several services run scheduled background tasks without external cron:

Task	Service	Schedule	Purpose
`cognition_cycle`	CognitionService	Every 4h	Detect patterns across recent KB entries
`self_observation`	CognitionService	Every 8h	Meta-cognitive reflection on thought quality
`engine_audit`	CognitionService	Every 12h	System health patterns, resource utilization
Evolution cycle	EvolutionService	GPU idle (<30% for 60s)	Agent prompt optimization via APO/GEPA
Health check	HealthService	Every 30s	Endpoint liveness polling
X bookmark sync	XBookmarksService	Configurable	Folder-first bookmark synchronization

Service Dependencies

Services form a directed dependency graph. The orchestrator and scheduler are foundational:

OrchestratorService → VLLMController → GPU Pool (6x NVIDIA)
SchedulerService → OrchestratorService → BackendRouter
EvolutionService → SchedulerService (submits jobs at EVOLUTION priority)
CognitionService → SchedulerService (submits jobs at NORMAL priority)
HealthService → GPU Pool (via pynvml, not via orchestrator)
TopologyService → CLTService → circuit-tracer (BluelightAI)
NGRCPredictor → TopologyService (embedding centroid trajectories)

Service Lifecycle

Each service implements a standard lifecycle:

class SomeService:
    async def start(self) -> None:
        """Initialize resources, start background tasks."""
        ...

    async def stop(self) -> None:
        """Clean shutdown, release resources."""
        ...

Services are registered at engine startup and torn down in reverse order. Background tasks use asyncio.create_task() with structured cancellation on shutdown.

Orchestrator

The OrchestratorService manages vLLM endpoint lifecycle and GPU allocation across 6 NVIDIA GPUs. It decides which models are loaded, on which GPUs, and handles startup sequencing, shutdown, preemption, and recovery.

Endpoint Lifecycle

Endpoints transition through these states:

PENDING → STARTING → HEALTHY
                  ↘ UNHEALTHY → FAILED
HEALTHY → STOPPING → STOPPED

EndpointStatus

@dataclass
class EndpointStatus:
    name: str           # "reasoning", "coding", etc.
    state: str          # "healthy", "starting", "unhealthy", "stopped"
    gpus: list[int]     # Allocated GPU indices
    pid: int | None     # vLLM process ID
    port: int           # Serving port
    model: str          # HuggingFace model ID
    uptime_seconds: int

Capability-Based Scheduling

The orchestrator follows a Yunikorn-inspired capability-based model:

Requests declare capabilities, not endpoints: A workload asks for “reasoning” capability, not a specific GPU or model. The orchestrator maps capabilities to endpoints.
Priority-based preemption: When a higher-priority workload needs a GPU that is occupied by lower-priority work, the orchestrator evicts the lower-priority endpoint, executes the workload, and restores baseline.
Makespan fulfillment: The OR-Tools CP-SAT solver plans multi-step operations. The orchestrator executes the plan, tracking progress through the AgendaTracker.

Baseline Set Points

The orchestrator maintains a baseline configuration — the steady-state GPU allocation. After transient workloads (rendering, evolution), it restores baseline automatically. This means interactive inference is never permanently degraded by batch operations.

GPU Allocation

Endpoint	GPUs	TP	Purpose
reasoning	0, 1	2	Large model inference (24B-70B)
coding	2, 3	2	Code generation
embedding	4	1	Nomic 768-dim vectors
(available)	5	—	Rendering, evolution, overflow

Clean Start

The clean_start() operation handles recovery from corrupted state — orphan processes, stale PID files, CUDA memory leaks:

result = await orch.clean_start(endpoints=["reasoning"])
# result.killed_processes — stale vLLM processes terminated
# result.freed_gpus — GPUs reclaimed
# result.started_endpoints — freshly started endpoints

This is the programmatic equivalent of just restart-clean, available via gRPC for use by the health system and ACP agent.

Health Integration

The orchestrator provides the AgendaTracker for the Health Observer. When an endpoint is part of a scheduled makespan operation, health checks skip incident creation:

if tracker.is_endpoint_in_scheduled_transition("reasoning"):
    expected = tracker.get_scheduled_endpoint_state("reasoning")
    # Don't create incident — this is planned

Three ControlMode values distinguish contexts:

POSITIVE: Planned operation (start/stop)
FAILURE: Responding to detected failure
RESTART_RECOVERY: Restarting after failure with extended grace period

Checking Status

# All endpoints
uv run gaius-cli --cmd "/gpu status" --format json | jq '.data.endpoints[]'

# Watch during restart
for i in $(seq 1 15); do
    sleep 10
    uv run gaius-cli --cmd "/gpu status" --format json | jq '.data.endpoints[] | {name, status}'
done

Scheduler

The SchedulerService provides a priority-based job queue for inference requests with XAI budget management, weighted completion time minimization, and OR-Tools constraint satisfaction for multi-GPU workload planning.

Priority Levels

Priority	Weight	Use Case
`CRITICAL` (0)	1.0	User-facing interactive requests
`HIGH` (1)	2.0	Interactive queries
`NORMAL` (2)	4.0	Background processing
`LOW` (3)	8.0	Batch operations
`EVOLUTION` (4)	16.0	Agent evolution (lowest priority)

Lower weights receive preferential scheduling. Critical requests preempt everything — if a CRITICAL job arrives while all endpoints are busy with EVOLUTION work, the scheduler preempts the lowest-priority running job.

Job Flow

InferenceJob → SchedulerService.submit()
    → priority_queue.push()  (heapq ordered by priority weight)
    → wait for endpoint availability
    → VLLMController.infer()  (dispatched to appropriate backend)
    → InferenceResponse

The scheduler routes through the BackendRouter, which dispatches to vLLM (standard inference), optillm (reasoning techniques like cot_reflection or bon), or external providers (xAI Grok, Cerebras) based on request parameters. The technique field on InferenceRequest selects optillm; the provider field selects external backends.

XAI Budget

The scheduler tracks daily usage of external AI APIs to prevent runaway costs:

budget = scheduler.get_xai_budget()
# budget.daily_remaining: tokens left for today
# budget.daily_limit: configured daily cap
# budget.reset_time: when the budget resets (midnight UTC)

Requests exceeding the daily budget are rejected with guru code #SCHED.00001.BUDGETEXHAUSTED. The budget resets at midnight UTC. Budget state persists in PostgreSQL so it survives engine restarts.

Makespan Scheduling

For compound workloads requiring multiple inference calls — agent evolution (candidate generation + evaluation), render pipelines (evict → load → execute → restore), or swarm runs (N agents × M rounds) — the scheduler delegates to the OR-Tools CP-SAT solver for makespan optimization.

The AgendaTracker coordinates with the HealthObserver to suppress false-positive incidents during planned transitions. When an endpoint is part of a scheduled makespan operation, health checks skip incident creation.

See Makespan Scheduling for constraint model details.

Timeouts

Context	Default Timeout	Rationale
General gRPC calls	30s	`GrpcClientConfig.timeout`
Inference (completions)	120s	A 24B model with `cot_reflection` takes 15-20s
Evaluation	120s	xAI evaluator may have network latency
Model loading	300s	A 70B model takes ~240s to load to VRAM

Timeouts are set per-call in _client.call(..., timeout=N). Override the default via GAIUS_ENGINE_TIMEOUT environment variable.

Backend Integration

The scheduler sits between clients and the backend layer:

CLI/TUI/MCP → gRPC → SchedulerService → BackendRouter
                                            ├── VLLMController (local GPU)
                                            ├── OptillmController (reasoning techniques)
                                            ├── EmbeddingController (Nomic 768-dim)
                                            ├── ColPaliController (multi-vector)
                                            └── ExternalInferenceRouter
                                                 ├── xAI (Grok 4.1 Fast)
                                                 └── Cerebras (fast inference)

The ExternalInferenceRouter provides access to frontier models for evaluation and calibration. Budget tracking applies only to external providers; local GPU inference is unlimited.

Protobuf Schema

The gRPC API is defined in Protocol Buffers. Changes to the proto require a specific workflow to keep generated bindings, internal enums, and status mappings in sync.

Key Files

File	Purpose
`engine/proto/gaius_service.proto`	Proto definitions (source of truth)
`engine/proto/gaius_service_pb2.py`	Generated Python bindings
`engine/proto/gaius_service_pb2_grpc.py`	Generated gRPC stubs
`engine/generated/__init__.py`	Re-exports for clean imports
`engine/grpc/servicers/gaius_servicer.py`	Server-side implementation

Endpoint Status Values

enum ProcessStatus {
    PROCESS_STATUS_UNSPECIFIED = 0;
    PROCESS_STATUS_STOPPED = 1;
    PROCESS_STATUS_STARTING = 2;
    PROCESS_STATUS_HEALTHY = 3;
    PROCESS_STATUS_UNHEALTHY = 4;
    PROCESS_STATUS_STOPPING = 5;
    PROCESS_STATUS_FAILED = 6;
    PROCESS_STATUS_PENDING = 7;   // Queued for startup
}

Startup state transitions: PENDING -> STARTING -> HEALTHY

Change Workflow

1. Edit the Proto File

Append new enum values. Don’t renumber existing values for wire compatibility.

2. Regenerate Bindings

just proto-generate

3. Update Generated Exports

Add new symbols to engine/generated/__init__.py:

Add to the import block
Add to the __all__ list

Critical: Skipping this step causes import errors at engine startup.

4. Update Internal Enums

If there’s a parallel Python enum (e.g., in vllm_controller.py), sync it with the proto enum.

5. Update Status Mappings

Add string-to-proto mappings in the servicer’s _STATUS_MAP.

6. Verify Import

uv run python -c "from gaius.engine.generated import NEW_SYMBOL; print('OK')"

7. Restart and Test

just restart-clean
uv run gaius-cli --cmd "/gpu status" --format json

Common Issues

Symptom	Cause	Fix
Engine fails to start	Missing export in `__init__.py`	Add symbol to imports and `__all__`
Port 50051 not listening	gRPC server didn’t initialize	Check logs for import errors
Status shows wrong value	Missing status mapping	Add to `_STATUS_MAP`

Testing gRPC Features

gRPC reflection is not enabled, so grpcurl cannot discover services. Use the CLI instead:

uv run gaius-cli --cmd "/gpu status" --format json | jq '.data.endpoints[] | {name, status}'

Health & Self-Healing

Gaius implements autonomous health monitoring based on FMEA (Failure Mode and Effects Analysis). The system quantifies risk using RPN (Risk Priority Number) scores, applies tiered remediation, and learns from outcomes to improve over time.

Architecture

The health system has four layers:

Detection: Scheduled checks, continuous watcher, and user reports identify issues
Analysis: FMEA engine calculates RPN scores from severity, occurrence, and detection ratings
Remediation: Three-tier system from automatic restarts to agent-assisted diagnosis to user approval
Learning: Adaptive learner adjusts S/O/D scores based on remediation outcomes

How It Works

When a health check detects an issue:

The FMEA engine maps it to a failure mode from the 34-mode catalog
RPN is calculated: RPN = S x O x D (max 1000)
Based on the RPN score, remediation is routed to the appropriate tier:
- RPN < 100 (Tier 0): Automatic procedural restart
- RPN 100-200 (Tier 1): Agent-assisted remediation
- RPN > 200 (Tier 2): Requires user approval
- RPN > 300: Escalates via ACP (Mistral Vibe) for meta-level intervention
Outcomes feed back into the adaptive learner, adjusting future risk scores

Health Check Categories

Category	Example Checks
Infrastructure	gRPC connection, PostgreSQL, Qdrant, MinIO
GPU	Memory usage, temperature
Endpoints	vLLM health, stuck endpoints, orphan processes
Evolution	Evolution daemon, cognition daemon
Resources	Disk space, scheduler queue, XAI budget

CLI Commands

# Run all health checks
uv run gaius-cli --cmd "/health" --format json

# Run checks for a specific category
uv run gaius-cli --cmd "/health gpu" --format json

# Apply automated fix
uv run gaius-cli --cmd "/health fix engine" --format json

# FMEA summary
uv run gaius-cli --cmd "/fmea" --format json

HealthObserver Daemon

The HealthObserver (health/observe.py) runs as a background daemon with a configurable poll interval (default 60s). On each cycle it executes all health checks, maps failures to FMEA failure modes, and manages a set of active incidents.

Incident lifecycle:

Detection — Health check fails; FMEA engine maps to a failure mode and computes RPN
Active — Incident is created with a fingerprint (failure mode + endpoint). Duplicate fingerprints are deduplicated
Healing — Self-healer applies the appropriate tier. Tier 0 restarts are immediate; Tier 1 uses a healthy endpoint for diagnosis; Tier 2 queues for approval
Recovering — Spontaneous recovery detected (health check passes without intervention). The observer distinguishes this from healed-by-intervention
Resolved — Outcome recorded. The adaptive learner adjusts S/O/D scores based on whether the fix succeeded, how long it took, and whether the issue was user-reported or auto-detected

Healing event audit trail: Every remediation attempt is recorded in the healing_events PostgreSQL table — including ACP escalations, which log the full prompt/response exchange. This provides a complete audit trail for post-incident analysis and for training the adaptive learner.

Scheduled transition awareness: The observer consults the AgendaTracker before creating incidents. Endpoints currently in a makespan-scheduled transition (e.g., model swap during GPU eviction) are excluded from incident creation, preventing false positives during planned operations.

ACP escalation: When an incident exceeds RPN 300 or fails three local remediation attempts, the observer escalates to Mistral Vibe via the Agent Client Protocol. The ACP agent analyzes the failure using MCP tools, identifies gaps in the /health fix framework, and commits improvements to the acp/health-fix branch for human review. Cadence limits (max 3 issues/24h, min 5 min between restarts) prevent runaway automation.

Subchapters

FMEA Framework — Risk scoring details and failure mode catalog
Remediation Strategies — Fix strategies and tier system
Health Observer — Continuous monitoring daemon
Guru Meditation Codes — Error identification system

FMEA Framework

FMEA (Failure Mode and Effects Analysis) replaces simple severity classification with structured ordinal risk assessment (Stamatis, 2003). Originally from manufacturing engineering, Gaius adapts it for software systems.

Risk Priority Number

Each failure mode is scored on three dimensions:

RPN = S x O x D (integer scores, range 1–1000)

Dimension	Meaning	Scale
S (Severity)	Impact on system availability	1 (negligible) to 10 (total failure)
O (Occurrence)	Probability of recurrence	1 (rare) to 10 (frequent)
D (Detection)	Ability to detect before impact	1 (always caught) to 10 (invisible)

Higher RPN means higher risk. The worst possible score (10 x 10 x 10 = 1000) indicates a severe, frequent, and undetectable failure. S, O, and D are ordinal scales — their product is a ranking heuristic, not a probability.

Action Thresholds

RPN Range	Tier	Action
1-100	Tier 0	Automatic procedural remediation
101-200	Tier 1	Agent-assisted remediation
201-400	Tier 2	Requires user approval
401-1000	Manual	Human intervention required

Conservative Overrides

Certain conditions always escalate regardless of RPN:

Detection >= 8: Poor observability requires approval
Safety level DESTRUCTIVE: Data-modifying actions require approval
Multiple correlated failures: Escalate to next tier

Failure Mode Catalog

34 failure modes across 7 categories:

GPU (6 modes)

ID	Failure Mode	S	O	D	RPN
GPU_001	Memory Exhaustion	8	6	4	192
GPU_002	Temperature Critical	9	3	2	54
GPU_003	Hardware Error	10	2	3	60
GPU_004	Driver Crash	8	3	4	96
GPU_005	Memory Fragmentation	7	5	4	140
GPU_006	Power Throttling	5	4	3	60

vLLM Endpoint (6 modes)

ID	Failure Mode	S	O	D	RPN
VLLM_001	Stuck Starting	6	5	5	150
VLLM_002	Stuck Stopping	4	4	4	64
VLLM_003	Health Check Failure	7	6	3	126
VLLM_004	Orphan Process	5	5	4	100
VLLM_005	OOM Crash	8	5	3	120
VLLM_006	KV-Cache Exhaustion	5	6	5	150

Model Quality (5 modes)

ID	Failure Mode	S	O	D	RPN
MQ_001	Hallucination Increase	7	4	6	168
MQ_002	Latency Degradation	4	5	3	60
MQ_003	Output Quality Drift	5	6	7	210
MQ_004	Semantic Drift	6	4	8	192
MQ_005	Context Exhaustion	6	5	4	120

Emergent Behavior (4 modes)

ID	Failure Mode	S	O	D	RPN
EB_001	Swarm Consensus Failure	6	4	6	144
EB_002	Cognition Loop	5	4	7	140
EB_003	Embedding Drift	6	5	8	240
EB_004	Self-Observation Bias	6	5	9	270

Note: Emergent behavior modes have high Detection scores (poor observability), reflecting the inherent difficulty of detecting these failure modes automatically.

Adaptive Learning

The system adjusts S/O/D scores based on remediation outcomes using exponential moving average (alpha = 0.2):

Successful fast fix: Occurrence decreases (problem is manageable)
Failed fix: Occurrence increases (problem is more persistent than estimated)
User-reported: Detection increases (automated checks missed it)
Early detection: Detection decreases (automated checks caught it)

CLI Commands

# FMEA summary with current RPN scores
uv run gaius-cli --cmd "/fmea" --format json

# Failure mode catalog
uv run gaius-cli --cmd "/fmea catalog" --format json

# Detail for specific failure mode
uv run gaius-cli --cmd "/fmea detail GPU_001" --format json

# Recent incidents
uv run gaius-cli --cmd "/fmea history" --format json

Remediation Strategies

Fix strategies are multi-step procedures that diagnose, repair, and verify service health. Each strategy is registered in the SERVICE_STRATEGIES dictionary and invoked via /health fix <service>.

Available Fix Strategies

Service	Strategy	Steps
`engine`	EngineFixStrategy	Kill stale processes, clean CUDA, restart
`dataset`	DatasetFixStrategy	Re-initialize NiFi connection, verify
`nifi`	NiFiFixStrategy	Check connectivity, restart processors
`postgres`	PostgresFixStrategy	Check connection, verify schema
`qdrant`	QdrantFixStrategy	Check connectivity, verify collections
`minio`	MinIOFixStrategy	Check connectivity, verify buckets
`endpoints`	EndpointsFixStrategy	Health check, restart unhealthy
`evolution`	EvolutionFixStrategy	Restart evolution daemon

Strategy Pattern

Each strategy follows the same pattern:

class EngineFixStrategy:
    async def execute(self) -> FixResult:
        # Step 1: Diagnose
        issues = await self.diagnose()

        # Step 2: Remediate
        for issue in issues:
            await self.fix(issue)

        # Step 3: Verify
        healthy = await self.verify()

        return FixResult(
            success=healthy,
            steps_taken=self.steps,
            duration_ms=elapsed,
        )

Three-Tier System

Tier 0: Procedural (RPN < 100)

Automatic restart without agent involvement:

# Kill stale process, wait, restart
await orchestrator.stop_endpoint(endpoint)
await asyncio.sleep(5)  # Cool-down
await orchestrator.start_endpoint(endpoint)

Tier 1: Agent-Assisted (RPN 100-200)

Uses a healthy inference endpoint to diagnose and decide on remediation:

diagnosis = await inference.analyze(issue.to_dict())
if diagnosis.action == "clear_cache":
    await clear_kv_cache(endpoint)
elif diagnosis.action == "rollback":
    await rollback_config(endpoint)

Tier 2: Approval Required (RPN > 200)

Creates an approval record for human review. Destructive operations (data modification, configuration changes) always require Tier 2 regardless of RPN.

Usage

# Fix a specific service
uv run gaius-cli --cmd "/health fix engine" --format json

# Fix all unhealthy services
uv run gaius-cli --cmd "/health fix all" --format json

Adding a New Fix Strategy

Create a class in health/service_fixes.py implementing execute() -> FixResult
Register it in SERVICE_STRATEGIES
Add a KB heuristic document
Test via /health fix <service>

Health Observer

The HealthObserver daemon provides continuous health monitoring with FMEA-based incident management, tiered self-healing, and ACP escalation for issues beyond local remediation capability.

Operation

The observer runs as a background service within the engine, polling system health at a configurable interval (default 60 seconds). Each poll executes health checks across all registered services, computes FMEA risk scores, and dispatches to the appropriate remediation tier.

Incident Lifecycle

Detection → Active → Healing → Recovered → Resolved
                  ↘ Escalated (ACP) → Resolved

Detection: Health check identifies a failure. The observer creates an Incident with a unique fingerprint (hash of service + failure mode + endpoint).
Active: FMEA engine computes RPN = S × O × D. The incident enters the tiered remediation pipeline based on its score.
Healing: Self-healing attempts in progress. Tier 0 (procedural restart, RPN < 100) runs automatically. Tier 1 (agent-assisted, RPN 100-200) uses a healthy endpoint to diagnose. Tier 2 (RPN > 200) queues for approval.
Recovered/Escalated: Either resolved locally or escalated via ACP when local remediation fails after 3 attempts.
Resolved: Terminal state. Healing events are recorded for adaptive learning.

Healing Event Audit Trail

Every remediation action is recorded as a HealingEvent with timestamp, tier, action taken, and outcome. This trail feeds the adaptive learning system — successful remediations reduce the Occurrence (O) score for that failure mode via EMA (alpha=0.2), while failures increase it. Over time, the FMEA scores converge toward the actual reliability characteristics of each component.

Fail Open

When filtering incidents for display, the observer uses fail open semantics: it filters OUT known terminal states (resolved) rather than filtering IN known active states. Unknown or unexpected states are always surfaced for investigation rather than hidden.

Makespan Integration

The observer integrates with the AgendaTracker to avoid false-positive incidents during scheduled operations. When an endpoint is part of a planned makespan transition (e.g., GPU eviction for rendering), the observer checks:

if tracker.is_endpoint_in_scheduled_transition("reasoning"):
    expected = tracker.get_scheduled_endpoint_state("reasoning")
    log.info(f"Skipping: endpoint in scheduled transition to {expected}")
    return  # Not an incident — this is intentional

Without this integration, every render pipeline operation would generate spurious health incidents as endpoints cycle through STOPPING → STOPPED → STARTING states.

ACP Escalation

When an incident exceeds the RPN threshold or local remediation fails after 3 attempts, the observer escalates via ACP (Agent Client Protocol) to Mistral Vibe:

The ACP agent analyzes the issue using MCP tools
Identifies gaps in the /health fix framework
Implements new fix strategies and heuristics
Commits to acp/health-fix branch for human review

The ACP agent is a meta-level maintainer — it doesn’t just fix the immediate issue, it teaches the system to fix similar issues autonomously in the future.

Cadence Limits

To prevent runaway automation:

Max 3 GitHub issues per 24 hours
Min 5 minutes between restart attempts
Max 3 restarts per endpoint per hour
Cooldown per incident fingerprint (prevents repeated escalation of the same issue)

Scheduled Transition Awareness

The observer distinguishes three ControlMode values from the AgendaTracker:

Mode	Meaning	Observer Behavior
`POSITIVE`	Planned operation (start/stop)	Skip incident creation
`FAILURE`	Responding to failure	Create incident, track remediation
`RESTART_RECOVERY`	Restarting after failure	Allow extended grace period

CLI Commands

# Observer status
uv run gaius-cli --cmd "/health observer" --format json

# Active incidents
uv run gaius-cli --cmd "/health incidents" --format json

# Incident detail
uv run gaius-cli --cmd "/health incident <id>" --format json

Guru Meditation Codes

Inspired by the Amiga’s iconic error screens, every failure mode in Gaius gets a unique identifier — a Guru Meditation Code. These codes create a traceable link from error messages to diagnostics and remediation.

Format

#<COMPONENT>.<SEQUENCE>.<MNEMONIC>

Component: Two or three letter abbreviation for the subsystem
Sequence: Zero-padded number unique within the component
Mnemonic: Human-readable description of the failure mode

Components

Code	Component
DS	DatasetService
NF	NiFi
EN	Engine
EP	Endpoints/Inference
EV	Evolution
DB	Database
QD	Qdrant
GR	gRPC
ACP	Agent Client Protocol
ACF	Article Curation Flow
HL	Health
XB	X Bookmarks

How They’re Used

Every error message includes the guru code and remediation path:

DatasetService not initialized.
  Guru: #DS.00000001.SVCNOTINIT
  Try: /health fix dataset
  Or:  just restart-clean

Design Rules

One code per failure mode: Each code maps to exactly one failure
Unique across the system: No two failure modes share a code
Stable: Codes are never renumbered once assigned
Documented: Each code has a KB heuristic with symptom, cause, and fix

KB Heuristics

Each guru code has a corresponding heuristic document in the knowledge base at build/dev/current/heuristics/gaius/<category>/<name>.md containing:

Symptom: What the user sees
Cause: Root cause analysis
Observation: How to detect programmatically
Solution: Remediation steps, including /health fix command

See Guru Meditation Codes Reference for the complete catalog.

Agent System

The agent system provides LLM orchestration patterns for domain analysis: role-based prompt execution, parallel inference coordination, temporal consolidation, and background evolution.

Terminology note: Components here are labeled “agents” following pre-2024 conventions (cf. LangChain, AutoGPT). They are orchestrated pipelines — they lack observe-reason-act loops and self-directed goal pursuit.

Swarm Execution

The primary pattern executes multiple LLM calls with distinct role-based system prompts in parallel:

Role	Perspective	Temperature
Leader	Strategic synthesis	0.7
Risk	Threat identification	0.6
Optimizer	Efficiency analysis	0.7
Planner	Roadmap development	0.7
Critic	Adversarial review	0.8
Executor	Implementation assessment	0.6
Adversary	Stress testing	0.8

Execution is parallel (asyncio.gather over parallel_inference()) but not agentic — roles do not observe each other’s outputs or iterate. Each role has lattice positioning behaviors: Leader seeks cluster centroids, Risk positions at boundaries (negative curvature regions), Adversary samples uniformly.

Latent Swarm (LatentMAS)

Reduces inter-agent token transfer by sharing embeddings instead of text via Qdrant (Guo et al., 2024). Each agent stores its output as a 768-dim Nomic embedding in the gaius_latent_memory collection. Subsequent agents retrieve relevant context via semantic search rather than receiving full text.

Token reduction: 70–90% compared to text-based coordination. The collection schema includes domain, agent_id, session_id, and timestamp — enabling both cross-agent retrieval within a session and longitudinal analysis across sessions.

ThetaAgent: Temporal Consolidation

ThetaAgent (agents/theta/agent.py) executes a five-stage deterministic pipeline for cross-temporal knowledge linking:

Temporal slicing — Documents organized into weekly slices (YYYY-WNN). Each slice is a consolidation unit.
NVAR dynamics — Nonlinear Vector AutoRegression via reservoir computing (Gauthier et al., 2021) computes a consolidation signal from embedding centroid trajectories. Given slice centroids c₁,…,cₜ ∈ ℝ⁷⁶⁸, NVAR predicts ĉₜ₊₁ and computes drift = ‖ĉₜ₊₁ − cₜ‖₂. High drift → high urgency → consolidation priority.
BERTSubs inference — Subsumption relationships (A ⊑ B) between concepts are inferred using BERTSubs (Chen et al., 2023) from DeepOnto. Requires an OWL domain ontology with rdfs:subClassOf axioms and JVM via JPype.
Knowledge Gradient selection — Candidate relationships filtered by the KG policy (Powell & Ryzhov, 2012), which balances exploration (uncertain candidates) against exploitation (high-confidence relationships). Only candidates whose expected improvement exceeds a cost threshold are selected.
Document augmentation — Selected relationships injected as wikilinks ([[Target]]) and action links ([action:search "query"]) into source documents.

MetaAgent Coordination

MetaAgentManager (agents/metaagent_swarm.py) coordinates specialist analysts to answer natural language questions by querying structured data sources:

Lineage analyst — Translates questions to Cypher queries against the lineage graph
Ops analyst — Translates to SQL queries against operational metrics
Resource analyst — Queries infrastructure state
Topology analyst — Queries embedding space structure

Results are synthesized by a Correlator (single LLM call). This is parallel execution with synthesis, not autonomous multi-agent reasoning.

Background Processes

Two daemons run within the engine:

Evolution Daemon — Optimizes agent prompts during GPU idle periods (<30% utilization). Methods include APO (Zhou et al., 2023) and GEPA (Guo et al., 2024). Agent versions can be merged via TIES (Yadav et al., 2023) or DARE (Yu et al., 2024) in parameter space.
Cognition Agent — Generates “thoughts” about patterns in KB activity (every 4–8h). Thought types: PATTERN, CONNECTION, CURIOSITY, SELF_OBSERVATION.

Subchapters

Evolution — RLVR-based prompt optimization
Cognition — Autonomous thought generation
Theta Consolidation — Temporal knowledge linking
CLT Memory — Cross-Layer Transcoder features

Evolution

The evolution subsystem optimizes agent system prompts through RLVR (Reinforcement Learning with Verifiable Reward) during GPU idle periods. It generates candidate prompts, evaluates them against held-out tasks, and promotes winners — operating as a continuous self-improvement loop.

Evolution Cycle

1. EvolutionDaemon monitors GPU utilization
2. When idle (<30% for 60s), select next agent (round-robin)
3. CurriculumAgent orders tasks by difficulty
4. AgentRunner generates responses via engine scheduler
5. DaemonOracle verifies responses against RASE constraints
6. compute_reward() produces scalar training signal
7. APO/GEPA optimizes prompt based on reward trajectory
8. If improved, promote new version; record lineage

The DaemonOracle uses the same RASE verification pipeline as production — Constraint[S] composition evaluated against API ground truth. This ensures the evolution reward signal is verifiable, not learned.

Optimization Methods

Method	Description	Reference
APO	Automatic Prompt Optimization — gradient-free meta-prompt search	Zhou et al., 2023
GEPA	Genetic Evolution of Prompt Architectures — crossover/mutation	Guo et al., 2024

Task Sources

Evolution draws training tasks from three sources:

Source	Method	Coverage
Nous Research	JSONL reasoning tasks (`load_reasoning_tasks()`)	General reasoning
RASE Objectives	KB objectives → tasks via `ObjectiveGenerator`	Domain-specific verification
Task Ideation	`TaskIdeationAgent` analyzes capability gaps	Targeted improvement

The CurriculumAgent orders tasks by difficulty, ensuring agents encounter progressively harder challenges rather than random sampling.

Held-Out Evaluation

The DailyEvaluator runs held-out evaluation against a reserved query pool — queries the evolution process has never seen:

Sample N queries from the held-out pool (default 50)
Run each through AgentRunner with the current prompt
Score responses via the xAI evaluator (independent critique)
Aggregate into a DailyEvalSummary with accuracy, per-capability breakdown, and trend

This provides an unbiased measurement of agent improvement over time, separate from the training reward signal.

Calibration

The CalibrationOracle cross-validates local evaluation scores against frontier models (Cerebras, xAI). If local scores diverge significantly from frontier assessments, it flags calibration drift — preventing the system from optimizing toward a biased local metric.

Model Merging

Agent versions can be combined in parameter space:

Method	Description	Reference
Linear	Weighted average: theta = alpha * theta_1 + (1-alpha) * theta_2	—
TIES	Resolves sign conflicts between model deltas	Yadav et al., 2023
DARE	Drop and rescale for sparse merging	Yu et al., 2024

The MergeCoordinator identifies merge candidates (versions with complementary strengths) and validates merged models before registration.

Atropos Compatibility

GaiusEvolutionEnv implements the Atropos BaseEnv interface, exposing Gaius evolution as a standard RL environment for external frameworks:

env = GaiusEvolutionEnv("leader")
item = await env.get_next_item()      # Training task
reward = await env.score_response(response)  # Verifiable reward

Configuration

evolution {
    enabled = true
    idle_threshold = 60    # seconds of GPU idle before triggering
    cycle_interval = 3600  # minimum seconds between cycles
}

The daemon runs in the engine process and activates only during GPU idle periods to avoid competing with interactive inference.

Cognition

The CognitionService generates autonomous “thoughts” by analyzing recent knowledge base activity. It runs as a set of scheduled background tasks within the engine, operating at three timescales.

Scheduled Tasks

Task	Interval	Scope
`cognition_cycle`	Every 4h	Detect patterns across recent KB entries
`self_observation`	Every 8h	Meta-cognitive reflection on thought quality and blind spots
`engine_audit`	Every 12h	System health patterns, resource utilization trends

Each task is triggered by the engine’s scheduler, not by external cron. Execution uses a reasoning endpoint via the standard inference pipeline.

Thought Types

Type	Description	Example
`PATTERN`	Recurring themes across documents	“Three KB entries this week reference yield curve inversion”
`CONNECTION`	Cross-domain relationships	“The risk model’s tail assumptions relate to the pension allocation strategy”
`CURIOSITY`	Questions warranting investigation	“No KB entries cover regulatory changes since Q3”
`SELF_OBSERVATION`	Meta-cognitive observations	“Recent thoughts are heavily weighted toward risk — expand analysis breadth”

How It Works

Each cognition cycle:

Retrieve context — Recent KB entries, existing thought chain, and current objectives
Analyze — LLM identifies patterns, connections, gaps, and meta-observations
Generate thoughts — Structured thought objects with type, content, and provenance
Store — Thoughts saved to the knowledge base as markdown files
Chain — Each thought records its trigger (scheduled, manual, or reactive) and the inputs that contributed to it

Thought Chain

Thoughts form a linked provenance chain. Each thought references:

Its trigger — what initiated the cycle (schedule, manual /cognition command, or reactive event)
Its inputs — which KB entries and prior thoughts were analyzed
Its type — categorization for downstream filtering

This creates an auditable trail of the system’s reasoning over time. The /thoughts CLI command displays the chain, and the TUI surfaces recent thoughts in the info panel.

Self-Observation

The self-observation cycle is distinct from regular cognition. It analyzes the thought chain itself rather than KB content:

Are thoughts converging on one topic at the expense of others?
Are CURIOSITY thoughts being addressed or accumulating?
Is thought quality improving or degrading over time?

Self-observation generates SELF_OBSERVATION thoughts that feed back into subsequent cognition cycles, creating a reflective loop.

CLI Commands

# Trigger cognition cycle manually
uv run gaius-cli --cmd "/cognition" --format json

# View recent thoughts
uv run gaius-cli --cmd "/thoughts" --format json

# Trigger self-observation
uv run gaius-cli --cmd "/self-observe" --format json

# Trigger engine audit
uv run gaius-cli --cmd "/engine-audit" --format json

Theta Consolidation

ThetaAgent executes a deterministic consolidation pipeline for cross-temporal knowledge linking. Named after theta rhythms in hippocampal replay (Zhang et al., 2015), it compresses temporal experience into durable knowledge connections.

The agent operates in two phases: SITREP (situational awareness) and Consolidation (cross-temporal linking).

Phase 1: SITREP

The /sitrep command generates a structured SituationReport through three components:

Horizons — Time-based attention windows inspired by hippocampal theta waves:

Horizon	Window	Purpose
EMPHASIS	1 day	Today’s focus
TACTICAL	7 days	This week
STRATEGIC	30 days	This month
SECULAR	90 days	This quarter
OPEN	Unbounded	All time

AttentionSchema — Models what information is currently attended to. Targets have priority (0.0–1.0) and decay rates. Active targets above a threshold are included in the report.

SituationReport — Aggregates objectives, recent thoughts, agenda items, current events, and capability notes into a single markdown-formatted output.

Phase 2: Consolidation

Stage 1: Temporal Slicing

Documents are organized into weekly slices (YYYY-WNN). Each slice represents a temporal unit for consolidation.

Stage 2: NVAR Dynamics

Nonlinear Vector AutoRegression via reservoir computing (Gauthier et al., 2021) computes a consolidation urgency signal from embedding centroid trajectories.

Given slice centroids c_1,…,c_t in R^768:

NVAR predicts c_hat_{t+1} using a polynomial basis over delayed embeddings
Drift = ||c_hat_{t+1} - c_t||_2
Urgency = sigmoid(alpha * drift)

High urgency indicates rapid semantic drift — the knowledge base is changing faster than consolidation is linking it.

Stage 3: BERTSubs Inference

Subsumption relationships (A is-a B) between concepts are inferred using BERTSubs (Chen et al., 2023) from DeepOnto. The inferencer classifies candidate concept pairs via fine-tuned BERT on rdfs:subClassOf axioms from an OWL domain ontology. Requires JVM (via JPype) and sufficient class count (~50+ classes).

Stage 4: Knowledge Gradient Selection

Candidate relationships are filtered using the Knowledge Gradient policy (Powell & Ryzhov, 2012). KG balances exploration (uncertain candidates) against exploitation (high-confidence relationships):

KG(x | S) = E[max_{x’} mu_{n+1}(x’) | S_n=S, x_n=x] - max_{x’} mu_n(x’)

Only candidates whose expected improvement exceeds a cost threshold are selected. The BeliefState tracks prior value and observation variance for each candidate.

Stage 5: Document Augmentation

Selected relationships are injected into source documents as:

Wikilinks: [[Target Document]] for navigation
Action links: [action:search "query"] for deferred execution

Effectiveness Tracking

The EffectivenessTracker measures augmentation impact by recording whether users follow injected links. When SHAP is available, feature attribution analysis identifies which augmentation types and document characteristics predict user engagement.

CLI Commands

# Situational report
uv run gaius-cli --cmd "/sitrep" --format json

# Run consolidation
uv run gaius-cli --cmd "/theta consolidate" --format json

# View consolidation stats
uv run gaius-cli --cmd "/theta stats" --format json

CLT Memory

Cross-Layer Transcoder (CLT) memory provides interpretable, sparse representations of agent state. Where standard latent memory uses dense 768-dim Nomic embeddings, CLT memory uses sparse feature vectors in a 20,480-dimensional feature space — typically ~115 active features per layer. This enables both efficient inter-agent communication and mechanistic interpretability of what agents attend to.

Two Collaboration Modes

Mode	Embedding	Storage	Token Reduction	Interpretability
Standard (LatentMAS)	Nomic 768-dim dense	`gaius_latent_memory`	70-90% vs text	Opaque
CLT-Enhanced	CLT 20,480-dim sparse	`gaius_clt_memory`	70-90% vs text	Sparse feature indices are interpretable

Both modes store embeddings in Qdrant and retrieve via semantic search. The key difference: CLT sparse features can be intersected across agents to find feature consensus — which specific features multiple agents independently activate on.

Standard Latent Memory (LatentMAS)

Each agent stores its output as a 768-dim Nomic embedding in Qdrant. Subsequent agents retrieve relevant context via semantic search rather than receiving full text (Guo et al., 2024).

Agent 1 → thinks → store embedding in Qdrant
Agent 2 → query Qdrant for relevant context → thinks → store embedding
Leader → retrieve all → synthesize

The collection schema includes domain, agent_id, session_id, and timestamp — enabling both within-session collaboration and longitudinal analysis.

CLT-Enhanced Memory

The CLTLatentMemory extracts sparse features from model activations using circuit-tracer (from BluelightAI):

thought = await clt_memory.store_from_content(
    agent_id="critic",
    content="The risk model underestimates tail events.",
    domain="pension",
)
# thought.features.active_indices — ~115 active of 20,480 total

Feature consensus: Given multiple agents’ CLT features for a domain, the intersection of active feature indices reveals which features the swarm collectively attends to. High agreement ratios indicate convergent analysis; low ratios indicate diverse perspectives.

CLT Projection Bridge

The CLTProjectionBridge maps sparse CLT features to the same coordinate space as dense Nomic embeddings, enabling unified visualization on the 19x19 grid:

CLT sparse features (20,480-dim) → learned projection → ColNomic (768-dim) → UMAP → grid position

This bridge also enables:

TraceEmbedder — Converts agent execution traces (step, action, state hash) to grid-plottable embeddings
AgentStateDecoder — Bidirectional: grid position → inferred agent focus domain and confidence

Integration with Topology

The TopologyService consumes CLT features to detect semantic attractors — regions in embedding space where agent attention converges:

CLTService → extract features → TopologyService → detect attractors → grid overlay

Attractors appear as persistent concentrations of agent positions on the lattice, visible in the TUI’s agent overlay mode.

CLI Commands

# Extract CLT features for current context
uv run gaius-cli --cmd "/clt extract" --format json

# CLT memory statistics
uv run gaius-cli --cmd "/clt stats" --format json

# Latent memory statistics
uv run gaius-cli --cmd "/latent stats" --format json

Data Pipeline

The data pipeline connects external sources to the knowledge base, card collections, and search index through a sequence of ingestion, processing, and indexing stages.

End-to-End Flow

Web Sources (Brave, arXiv, RSS)
    |
    v
NiFi Ingestion ──> Raw Content (HX / Iceberg)
    |
    v
Metaflow Pipelines ──> Article Drafts, Card Creation
    |
    v
Qdrant Indexing ──> 768-dim Nomic Embeddings
    |
    v
PostgreSQL (zndx_gaius:5444) ──> Cards, Collections, Metadata
    |
    v
R2 Storage ──> Rendered Visualizations (viz.gaius.zndx.org)

Pipeline Stages

Ingestion. NiFi processors fetch content from external APIs, RSS feeds, and web search results (Brave). Raw content is stored in Apache Iceberg tables via the HX data lake before any processing occurs. This preserves the original source material and provides a replay capability.

Processing. Metaflow pipelines handle the compute-intensive work: PDF conversion via docling, topic extraction via BERTopic, relevance scoring via local LLMs, and article draft generation. See Metaflow Integration for details on the execution environment.

Article Curation. The Article Curation flow orchestrates the full lifecycle from article selection through card creation and publication. Each run produces approximately 20 cards in under 2 minutes.

Indexing. Processed content is embedded using Nomic (768-dimensional vectors) and indexed in Qdrant for semantic search. The same embeddings drive the TUI’s 19x19 grid layout and the visualization pipeline.

Storage. Cards, collections, and metadata live in PostgreSQL (zndx_gaius on port 5444). Rendered card images are uploaded to Cloudflare R2 and served from viz.gaius.zndx.org. See Viz Storage for the object key convention.

Lineage Tracking

Every pipeline stage emits OpenLineage events that are materialized into an Apache AGE graph. This provides full provenance from source URL to published card. See Lineage Tracking for Cypher query examples.

Knowledge Base

The Knowledge Base serves as both input and output of the pipeline. Articles begin as zettelkasten notes in build/dev/scratch/, and the curation flow produces structured content in build/dev/current/articles/.

Key Services

Service	Role	Port
NiFi	Content ingestion	8443
Metaflow	Pipeline execution	8180
PostgreSQL	Metadata, cards, collections	5444
Qdrant	Vector search	6333
MinIO	Artifact storage (S3-compatible)	9000
Gaius Engine (gRPC)	Orchestration, scheduling	50051

CLI Access

# List available flows
uv run gaius-cli --cmd "/flows list"

# Trigger article curation
uv run gaius-cli --cmd "/article curate ai-reasoning-weekly"

# Query lineage for a KB file
uv run gaius-cli --cmd "/lineage query scratch/2026-03-14/paper.md"

Metaflow Integration

Gaius uses Metaflow for production data pipelines that run on Kubernetes. Flows handle article curation, content evaluation, rendering, and document processing.

Infrastructure

The Metaflow service is deployed via Tilt in infra/tilt/ and runs on the local RKE2 Kubernetes cluster. The service is exposed via K8s NodePort on port 30180 (patched automatically by metaflow-port-forwards.sh on startup).

The environment variable METAFLOW_SERVICE_URL=http://localhost:30180 must be set for flow execution. This is configured automatically in devenv.nix enterShell for interactive shells and explicitly in process scripts.

GaiusFlow Base Class

All Gaius flows inherit from GaiusFlow, which provides OpenLineage integration and KB path helpers:

from gaius.flows import GaiusFlow
from metaflow import step

class MyFlow(GaiusFlow):
    @step
    def start(self):
        self.emit_lineage_start("my_flow", inputs=[...])
        self.next(self.process)

    @step
    def end(self):
        self.emit_lineage_complete(outputs=[...])

KB path helpers generate paths following the zettelkasten convention:

# scratch/{date}/{HHMMSS}_{title}.md
path = self.zettelkasten_path("My Analysis")

# current/archive/{quarter}/attachments/{filename}
path = self.archive_path("paper.pdf")

Flow Registry

Flows are registered for CLI discovery using the @register_flow decorator:

from gaius.flows import register_flow

@register_flow("article-curation")
class ArticleCurationFlow(GaiusFlow):
    ...

Registered flows can be listed and invoked from the CLI or MCP tools.

Available Flows

Flow	Purpose	Typical Duration
ArticleCurationFlow	End-to-end article research and card publication	~2 min
ArxivDoclingFlow	Fetch and convert arXiv papers to markdown	~30s
ClouderaDocsFlow	Sync Cloudera documentation archives	varies

See Article Curation for the full 11-step pipeline.

Configuration

Key environment variables:

Variable	Purpose
`METAFLOW_SERVICE_URL`	Metaflow service endpoint (http://localhost:30180)
`METAFLOW_DATASTORE_SYSROOT_S3`	MinIO path for flow artifacts
`METAFLOW_DEFAULT_METADATA`	Metadata backend (postgresql)
`GAIUS_KB_ROOT`	Knowledge base root directory

Running Flows

# Via Metaflow CLI
python -m metaflow.cli run ArticleCurationFlow --article ai-reasoning-weekly

# Via Gaius CLI
uv run gaius-cli --cmd "/article curate ai-reasoning-weekly"

# Via MCP tool
uv run gaius-cli --cmd "/fetch_paper 2312.12345"

K8s Prerequisites

kubectl and k9s are Nix-managed via devenv.nix (not the system RKE2 binary)
KUBECONFIG must be set to ~/.config/kube/rke2.yaml (never use fallback syntax)
K8s pods need pg_hba.conf entries for 10.42.0.0/16 and 10.43.0.0/16 subnets

Article Curation

The ArticleCurationFlow is an 11-step Metaflow pipeline that automates the discovery, research, drafting, and publication of articles. It is the primary content production mechanism in Gaius.

Pipeline Overview

start ──> grok_research_summary ──> select_article ──> acquire_external
   ──> update_manifest ──> create_draft ──> create_base ──> create_cards
   ──> enrich_cards ──> publish_batch ──> end

Each run produces approximately 20 cards in under 2 minutes.

Article Discovery

Articles live at current/articles/{slug}/ in the knowledge base. Each article directory contains a markdown file with YAML frontmatter that must include keywords and/or news_queries to guide the Brave search fetcher:

---
title: "AI Reasoning Weekly"
keywords: ["chain-of-thought", "reasoning models", "test-time compute"]
news_queries: ["AI reasoning breakthroughs 2026"]
---

Empty keywords trigger a fail-fast error:

#ACF.00000013.NOHINTS - Article has no keywords or news_queries
  Try: Add keywords to the article frontmatter

Article Selection

The selection rubric evaluates candidate articles using several signals. The curation_readiness gate prevents selecting articles that lack sufficient zettelkasten notes or have incomplete frontmatter. Collection balance – specifically pending_cards count – is the most effective diversity signal, steering selection toward underrepresented topics.

External Source Acquisition

Once an article is selected, the flow fetches external sources in parallel using Brave search. Results are scored for relevance by a local LLM. Only sources exceeding the relevance threshold are retained.

Draft Generation and Card Creation

Drafts are synthesized using Grok, drawing from the article’s zk/ zettelkasten notes. The flow does NOT search the broader KB to avoid exposing private materials in published articles.

After drafting, the flow creates a BFO-grounded .base file with references, then generates collection cards from those references. Cards are created with pending status.

Enrichment Before Publish

Cards must be fully enriched before publication. Enrichment includes:

Summary generation – LLM-generated card summaries
Image rendering – Procedural visualizations via LuxCore

Only cards that pass both enrichment steps are published. Failed cards remain pending for the next run. This prevents incomplete content from appearing on the site.

CLI Access

# Curate a specific article
uv run gaius-cli --cmd "/article curate ai-reasoning-weekly"

# List available articles
uv run gaius-cli --cmd "/article list"

# Check article status
uv run gaius-cli --cmd "/article status"

Fail-Fast Guarantees

The flow fails immediately if required services are unavailable. No fallbacks or placeholder content is generated. Key guru meditation codes:

Code	Meaning
`#ACF.00000013.NOHINTS`	Article missing keywords/news_queries
`#FL.00001.DOCLING_FAIL`	Document conversion failed
`#FL.00002.METAFLOW_DB`	Metaflow metadata DB unavailable

Privacy

The curation flow only uses the article’s own zk/ notes as source material. It does not search the broader knowledge base, ensuring private materials are never exposed in published articles.

Knowledge Base

The knowledge base is a markdown-first document store organized as a zettelkasten. It lives under build/dev/ (gitignored) and is accessible through MCP tools for CRUD operations.

Directory Structure

build/dev/
├── current/            # Active work (manually curated)
│   ├── projects/       # Project-specific documents
│   ├── articles/       # Article directories with frontmatter
│   ├── content/domains/ # Domain-specific content
│   └── heuristics/     # Guru meditation heuristic files
│       └── gaius/
├── scratch/            # Zettelkasten notes (organized by date)
│   ├── 2026-03-14/
│   │   ├── 103045_my_analysis.md
│   │   └── 142200_research_notes.md
│   └── 2026-03-13/
└── archive/            # Quarterly archives
    └── 2026Q1/
        └── attachments/

current/ contains active, manually curated work. Articles, projects, and domain content live here. Heuristic files for guru meditation codes are stored at current/heuristics/gaius/{category}/{name}.md.

scratch/ is the zettelkasten. Files are organized by date and named with a time prefix: {HHMMSS}_{title}.md. This is where Metaflow pipelines deposit processed content and where daily research notes accumulate.

archive/ holds quarterly archives with binary attachments (PDFs, images) that are too large for the scratch directory.

MCP Tools

The KB is fully accessible through MCP tools, enabling AI assistants and agents to read, write, and search the knowledge base:

Tool	Operation
`search_kb`	Full-text search across all KB content
`read_kb`	Read a specific file by path
`create_kb`	Create a new file at a given path
`update_kb`	Update an existing file
`list_kb`	List files in a directory
`delete_kb`	Delete a file

# Search the knowledge base
uv run gaius-cli --cmd "/search_kb 'persistent homology'"

# Read a specific file
uv run gaius-cli --cmd "/read_kb scratch/2026-03-14/103045_analysis.md"

Path Conventions

Metaflow flows use helper methods on GaiusFlow to generate consistent paths:

# Zettelkasten path: scratch/{date}/{HHMMSS}_{title}.md
path = self.zettelkasten_path("My Analysis")
# -> "scratch/2026-03-14/103045_my_analysis.md"

# Archive path: current/archive/{quarter}/attachments/{filename}
path = self.archive_path("paper.pdf")
# -> "current/archive/2026Q1/attachments/paper.pdf"

Integration with Pipelines

The KB serves as both input and output for the data pipeline:

Input: Articles with frontmatter and zettelkasten notes drive the article curation flow
Output: Processed papers, research summaries, and draft articles are written back to scratch/ or current/
Lineage: KB file paths appear as Dataset nodes in the lineage graph, enabling provenance queries from source URL to KB entry

Storage Backend

KB operations go through gaius.storage.kb_ops, which manages the filesystem-backed store. The GAIUS_KB_ROOT environment variable overrides the default build/dev/ location. Content is not stored in the database – the KB is a plain filesystem hierarchy, making it easy to browse, grep, and version control externally.

Sync to HX

Raw content (PDFs, API responses) is stored separately in the HX data lake (Apache Iceberg) to prevent the KB from being overwhelmed with unprocessed data. Only curated summaries and processed markdown enter the KB.

Lineage Tracking

Lineage tracking provides graph-based provenance that connects data sources to derived artifacts. Every pipeline stage emits OpenLineage events that are materialized into an Apache AGE graph stored in PostgreSQL.

Architecture

Metaflow Pipelines ──┐
Fetch Workers ───────┤──> RunEvent ──> LineageEmitter ──> Apache AGE Graph
Agents ──────────────┘                                         |
                                                               v
                                                    Cypher Queries (MCP + CLI)

Graph Schema

The lineage graph uses four vertex labels and four edge labels:

Vertices:

Dataset – a data source or sink (namespace, name)
Job – a processing definition (namespace, name)
Run – a single execution of a job (run_id, state, event_time)

Edges:

INPUT_TO – Dataset consumed by Run
OUTPUTS – Run produced Dataset
EXECUTES – Job spawned Run
PARENT – Run is child of another Run

OpenLineage Events

Flows emit events at key lifecycle points:

Event	Timing	Purpose
`START`	Flow begin	Record input datasets
`COMPLETE`	Flow end	Record output datasets
`FAIL`	On error	Record failure with context

from gaius.hx.lineage import get_emitter, RunEvent, Dataset, Job

emitter = get_emitter()

event = RunEvent.complete(
    run=run,
    job=Job("gaius.flows", "ArticleCurationFlow"),
    inputs=[Dataset("gaius.source", "brave:ai-reasoning")],
    outputs=[Dataset("gaius.kb", "scratch/2026-03-14/paper.md")],
)
await emitter.emit(event)

Cypher Queries

Lineage can be queried via the MCP lineage_cypher tool or the CLI:

# Trace upstream sources for a KB file
uv run gaius-cli --cmd "/lineage query scratch/paper.md"

Example Queries

Find all KB files derived from arXiv sources:

MATCH (s:Dataset)-[:INPUT_TO]->(:Run)-[:OUTPUTS]->(kb:Dataset)
WHERE s.namespace = 'gaius.source' AND s.name STARTS WITH 'arxiv:'
RETURN s.name as source, kb.name as kb_path

Trace full provenance chain (up to 5 hops):

MATCH path = (src:Dataset)-[:INPUT_TO|OUTPUTS*1..5]->(target:Dataset)
WHERE target.namespace = 'gaius.kb'
  AND target.name CONTAINS 'attention_is_all_you_need'
RETURN src.namespace, src.name

Count vertices by label:

MATCH (n) RETURN labels(n)[0] as label, count(n) as cnt

HX Package

The lineage subsystem lives in gaius.hx.lineage:

hx/lineage/
├── events.py    # Dataset, Job, Run, RunEvent (OpenLineage types)
├── emitter.py   # LineageEmitter (store + graph sync)
└── graph.py     # AGE Cypher helpers

The parent gaius.hx package is the raw content data lake (Apache Iceberg). Lineage events bridge HX raw storage to KB curated content, recording every transformation step.

Integration Points

Metaflow flows emit START/COMPLETE/FAIL events via the GaiusFlow base class
Fetch workers emit events when acquiring external content
MCP tools expose query_lineage and lineage_cypher for graph traversal
The lineage graph is stored in the same PostgreSQL instance (zndx_gaius:5444) using the Apache AGE extension

Inference

The inference layer routes requests across multiple backends: vLLM for local GPU models, optillm for reasoning enhancement, and external APIs (xAI, Cerebras) for cloud-based inference. All requests flow through a priority scheduler backed by OR-Tools CP-SAT.

Backend Router

The BackendRouter (inference/router.py) selects backends based on capability requirements — model availability, GPU allocation, and whether reasoning enhancement is requested. Routing decisions are logged to routing_analytics.py for offline analysis of backend utilization patterns.

Backend	Purpose	Hardware
vLLM	Local model inference	6x NVIDIA GPUs
optillm	Reasoning enhancement (CoT, BoN, MoA)	Proxies to vLLM
xAI (Grok)	External API inference	Cloud
Cerebras	External API inference	Cloud
Nomic	Text embeddings (768-dim)	1 GPU

Scheduler

The scheduler (inference/scheduler.py) implements priority-based job assignment using OR-Tools CP-SAT (Google, 2024) for constraint satisfaction:

Priority	Level	Use Case
CRITICAL (0)	User-facing requests, leader synthesis	Preempts lower priorities
HIGH (1)	Swarm agent calls	Parallel execution
NORMAL (2)	Background tasks	Queue-ordered
LOW (3)	Speculative inference	Fills idle capacity

The CP-SAT solver minimizes weighted completion time subject to endpoint capacity, GPU memory limits, and tensor parallelism requirements. Preemption is enabled by default — a CRITICAL job can interrupt LOW work.

Synthesis Pipeline

The ZettelkastenSynthesizer (inference/synthesis.py) generates structured notes from multi-source retrieval:

Query — Input question or topic
KB search — Vector similarity search over existing knowledge base entries
Web search — Brave API retrieval for external sources
Source ranking — Citations scored by relevance (0.0–1.0) and deduplicated
LLM synthesis — Generates a Zettelkasten note with inline citations
Persistence — Note saved to KB with full citation provenance

Each note carries structured Citation objects (source path/URL, quote excerpt, relevance score), enabling downstream traceability.

Evaluation System

Output quality is assessed via a tiered evaluation pipeline (inference/evaluation.py):

Tier	Model	Use Case
Local	Orchestrator-8B (vLLM)	Quick filtering, high-volume
XAI	Grok 4.1 Fast	Promotion decisions, high-stakes

Five quality dimensions are scored: accuracy, coherence, relevance, completeness, and clarity. Budget controls limit XAI evaluations to high-stakes decisions (configurable daily/weekly caps). Local evaluation runs on the same vLLM infrastructure as inference, using a lightweight model that doesn’t compete for GPU resources with the primary reasoning endpoints.

optillm Techniques

Reasoning enhancement via optillm (Maheshwari, 2024), which proxies to vLLM and applies inference-time techniques:

Technique	Description
`cot_reflection`	Chain-of-thought with self-reflection
`bon`	Best-of-N sampling
`moa`	Mixture of Agents
`rto`	Round-trip optimization
`z3`	Z3 solver integration
`leap`	Learn from examples

Request Flow

Client → gRPC → Scheduler (priority queue) → BackendRouter → Backend
                                                             ↗ vLLM (local GPU)
                                                            ↗ optillm → vLLM
                                                           ↗ xAI API (cloud)

All inference requests route through the gRPC engine (port 50051) for centralized scheduling, audit logging, and resource management. The scheduler enforces job priorities and endpoint capacity limits before dispatching to the router.

Subchapters

vLLM Controller — GPU process management
Makespan Scheduling — Multi-workload optimization
XAI Budget — External API rate limiting

vLLM Controller

The VLLMController manages vLLM inference server processes across 6 NVIDIA GPUs, handling startup sequencing, health monitoring, graceful shutdown, orphan detection, and CUDA memory recovery.

Process Lifecycle

Each vLLM endpoint runs as a subprocess managed by the controller:

start_endpoint() → subprocess.Popen(vllm serve ...)
    → PENDING (queued, waiting for GPU)
    → STARTING (process spawned, model loading)
    → HEALTHY (health check passes, serving)

Shutdown: SIGTERM → 30s grace period → SIGKILL if needed → torch.cuda.empty_cache() to reclaim VRAM.

Orphan Detection

On startup, the controller scans for stale vLLM processes from previous runs. Orphan processes consume GPU memory without serving requests — a common issue after unclean shutdowns. The controller identifies them by matching process names and GPU device assignments, then terminates before attempting fresh starts.

GPU Allocation

6 NVIDIA GPUs (80GB each) allocated across endpoints:

Endpoint	GPUs	Tensor Parallel	Purpose
reasoning	0, 1	2	Large model inference (24B-70B)
coding	2, 3	2	Code generation
embedding	4	1	Nomic 768-dim single-vector
available	5	—	Rendering, evolution, overflow

Allocation is managed by the OrchestratorService, not the controller directly. The orchestrator calls VLLMController.start() with specific GPU IDs and ports. GPU 5 is deliberately kept available for transient workloads — LuxCore rendering requires GPU eviction of a running endpoint, and evolution cycles benefit from a dedicated GPU during idle periods.

Model Loading

Loading a 70B model to VRAM takes ~240 seconds. During this time:

The engine streams progress to connected clients via gRPC server-streaming
The endpoint status transitions: PENDING → STARTING → HEALTHY
Health checks begin polling at 30-second intervals
The AgendaTracker marks the endpoint as in-transition to suppress health incidents

A circular log buffer (500 lines per endpoint) captures vLLM stderr for diagnostics. When a start fails, the buffer provides the error without requiring external log access.

Health Monitoring

The controller polls each endpoint’s /health HTTP endpoint at configurable intervals (default 30s). Three consecutive failures trigger an incident. Health check results feed into the FMEA engine — a persistent health degradation increases the Occurrence (O) score for that failure mode, raising the RPN and potentially escalating the remediation tier.

Optillm Integration

The OptillmController wraps vLLM with reasoning enhancement techniques:

Technique	Description
`cot_reflection`	Chain-of-thought with self-reflection
`bon`	Best-of-N sampling
`moa`	Mixture of Agents
`pv`	Plan and Verify
`z3`	Z3 SMT solver integration

These are accessed via the technique field on InferenceRequest. A request for cot_reflection on a 24B model typically takes 15-20 seconds per completion.

Common Issues

Symptom	Guru Code	Fix
Process won’t start	`#EP.00000001.GPUOOM`	`/health fix endpoints`
Orphan process blocking GPU	`#EN.00004.ORPHAN_PROC`	`just gpu-cleanup`
cv2 import error	OpenCV conflict	See pyproject.toml uv override
KV-cache exhaustion	`#VLLM_006`	Reduce `max_model_len` or restart
Model loading timeout	—	Check disk I/O; HuggingFace cache may be cold

Makespan Scheduling

Makespan scheduling optimizes GPU utilization across multi-step workloads that require endpoint transitions — eviction, loading, inference, and restoration. The scheduler uses OR-Tools CP-SAT for constraint-based resource assignment.

What is a Makespan?

A makespan is the total time from start to finish of a complex workload that may require multiple GPU state transitions:

GPU eviction — Stop a low-priority endpoint to free GPUs
Endpoint startup — Load a different model (~240s for large models)
Workload execution — Run the actual inference or rendering
Baseline restoration — Reload the original endpoint to its set point

The scheduler minimizes total makespan by solving a constraint satisfaction problem over GPU assignments, endpoint capacities, and tensor parallelism requirements.

OR-Tools CP-SAT Integration

The scheduler (inference/scheduler.py) formulates GPU assignment as a CP-SAT model (Google, 2024):

Minimize: weighted_completion_time = Σ priority_weight[j] × completion_time[j]

Subject to:
  - Each job assigned to exactly one endpoint
  - Endpoint GPU capacity not exceeded
  - Tensor parallelism requirements met (e.g., 70B model needs 4 GPUs)
  - GPU memory limits respected per device

Four priority levels control preemption:

Priority	Level	Weight	Preemption
CRITICAL (0)	User-facing, leader synthesis	8x	Can preempt all lower
HIGH (1)	Swarm agent calls	4x	Can preempt NORMAL/LOW
NORMAL (2)	Background tasks	2x	Queue-ordered
LOW (3)	Speculative inference	1x	Fills idle capacity

AgendaTracker

The AgendaTracker (engine/services/agenda_tracker.py) records scheduled endpoint transitions so the Health Observer can distinguish intentional state changes from failures. Without this, a planned GPU eviction for rendering would trigger a false FMEA incident.

Control Mode	Purpose
`POSITIVE`	Planned operation (start/stop/swap)
`FAILURE`	Responding to detected failure
`RESTART_RECOVERY`	Restarting after failure resolution

The Health Observer checks is_endpoint_in_scheduled_transition() before creating incidents — endpoints in a POSITIVE transition are excluded.

Example: Render Pipeline

The visualization render workload demonstrates a full makespan:

makespan.execute("render_cards")
├── allocate_gpus              # CP-SAT assigns GPU 5 (least-loaded)
├── evict_if_needed            # Stop vLLM coding endpoint on GPU 5
│   └── agenda_tracker.register(mode=POSITIVE, endpoints=["coding"])
├── start_endpoints            # Load LuxCore renderer
│   └── endpoint.start: rendering
│       ├── process_spawn
│       └── health_check
├── execute_workload           # Path-trace 20 cards (PATHOCL, 20s/128spp each)
├── clear_embeddings           # Release Nomic model (~3GB) from GPU
└── restore_baseline           # Restart coding endpoint to set point
    └── agenda_tracker.complete(operation_id)

Tracing

Each makespan is traced as a parent OpenTelemetry span with child spans for each phase. This provides end-to-end latency visibility, including time in external API calls (treated as black-box stages). Traces flow through the standard OTel Collector → Prometheus pipeline.

XAI Budget

The XAI budget system tracks and limits usage of external AI APIs (xAI Grok, Cerebras) to prevent runaway costs while enabling strategic use for evaluation, critique, and calibration.

Why External APIs

Local models (24B on vLLM) handle most inference. External frontier models serve specific roles where independent judgment matters:

Use Case	Provider	Model	Purpose
Agent evaluation	xAI	Grok 4.1 Fast	Independent critique of agent output quality
Calibration	xAI + Cerebras	Grok + Llama	Cross-validate local evaluation scores against frontier
Held-out evaluation	xAI	Grok 4.1 Fast	Measuring agent improvement on unseen queries
Documentation review	xAI	Grok 4.1 Fast	Precision audits and tone calibration

The CalibrationOracle specifically uses external APIs to detect evaluation drift — if local scores diverge significantly from frontier assessments, it flags calibration drift, preventing the system from optimizing toward a biased local metric.

Budget Tracking

The budget persists in PostgreSQL so it survives engine restarts:

budget = scheduler.get_xai_budget()
# budget.daily_remaining — tokens left for today
# budget.daily_limit — configured daily cap
# budget.reset_time — midnight UTC
# budget.used_today — tokens consumed since last reset

Usage Controls

Daily token limit: Configured per provider in HOCON (engine.xai_daily_limit)
Request rejection: When budget exhausted, requests fail with #SCHED.00001.BUDGETEXHAUSTED
Priority gating: Only HIGH and CRITICAL priority jobs can use external APIs — evolution and batch jobs use local inference only
Evaluation budget: Separate allocation for agent evaluation tasks, preventing held-out evaluation from consuming the interactive budget
Cost tracking: Each external API call records provider, model, token count, and latency for cost attribution

Routing

The ExternalInferenceRouter dispatches to the appropriate provider:

from gaius.inference import ExternalInferenceRouter

router = ExternalInferenceRouter()
result = await router.complete(
    prompt="Evaluate this agent response...",
    provider="xai",          # or "cerebras"
    max_tokens=2048,
)

The router checks budget before dispatching. If the daily limit is reached, it raises an error rather than silently falling back to local inference — the caller must decide whether to retry locally or wait for budget reset.

CLI Commands

# Check current budget
uv run gaius-cli --cmd "/xai budget" --format json

# Reset budget (admin)
uv run gaius-cli --cmd "/xai reset" --format json

# Evaluate with external model
uv run gaius-cli --cmd "/xai evaluate" --format json

# Compare local vs external evaluation
uv run gaius-cli --cmd "/xai compare" --format json

Visualization

The visualization pipeline generates procedural card images using LuxCore path tracing. Each card’s image is deterministic — seeded by the card ID and parameterized by features extracted from the embedding space’s geometry and topology.

Pipeline

Nomic Embeddings (768-dim)
    |
    ├──> GeometryComputer (Ollivier-Ricci curvature, gradient fields)
    └──> TDAComputer (persistent homology via ripser)
            |
            v
        CardVizData (normalized feature vector per card)
            |
            v
        Grammar Engine (CFDG-inspired recursive expansion)
            |
            v
        MeshGen (pure numpy mesh generators)
            |
            v
        LuxCore Renderer (PATHOCL GPU / PATHCPU fallback)
            |
            v
        R2 Storage (viz.gaius.zndx.org)

Mathematical Grounding

Visualization parameters are computed from the embedding space’s intrinsic geometry:

Ollivier-Ricci curvature: Computed on the k-NN graph (k=15, cosine metric, alpha=0.5, OTD method) via GraphRicciCurvature. For adjacent nodes x, y: kappa(x,y) = 1 - W1(mu_x, mu_y) / d(x,y), where W1 is the 1-Wasserstein distance between neighborhood distributions. Per-node curvature is the mean over incident edges. Controls glass color temperature (warm at positive kappa, cool at negative) and petal count.
Persistent homology: Vietoris-Rips filtration via ripser over cosine distances (max_dim=2, coefficients in Z/2). Total persistence (sum of interval lengths, normalized via tanh) controls recursion depth. Persistent Betti numbers b1 (rank of H1 at the median filtration value) generate toroidal rings (0-3). b2 generates void chambers (0-2). Individual persistence intervals spawn filament structures whose scale encodes interval lifetime.
Gradient fields: The curvature gradient (nabla kappa) is approximated by finite differences on the k-NN graph, projected to 2D via PCA. Positions the key light source. Divergence (nabla dot nabla kappa) controls glass boundary emission.
Complexity: Mean cosine distance to k-nearest neighbors, normalized across the collection. Controls surface subdivision and branching probability — isolated cards produce finer geometry.

Grammar Engine

grammar.py implements a CFDG-inspired recursive expansion system (Horigan, 2004; Context Free Design Grammars). The core mechanism: at each expansion step, the grammar chooses among alternative productions with probabilities derived from the card’s feature vector. Transforms compose multiplicatively, producing self-similar structures at decreasing scales.

Deterministic seeding: sha256(card_id) seeds the RNG, so the same card always produces the same visualization regardless of when or where it is rendered.

Termination: Expansion stops when accumulated scale drops below MIN_SCALE (0.08) or the shape budget (MAX_SHAPES = 35) is exhausted.

Feature-to-weight mappings:

Feature	Grammar Effect
curvature	Petal count, recurse-vs-stop weight, dome factor
persistence	Max depth (3–7), shell nesting weight, spiral count
complexity	Branch-vs-grow weight, surface segments
boundary	Emission strength, volume density, core radius
b1	Number of toroidal rings (0–3)
b2	Number of void chambers (0–2)
diagram	Filament count, scale, and z-position from persistence intervals
card_index	Phase offset for rotational variety within a collection

Three root arrangement modes (cluster, spiral, branches) combine with six shape primitives (petal, shell, torus, void, filament, core).

Mesh Generators

meshgen.py provides pure numpy mesh generators — each is a function (parameters) → (vertices, faces, normals):

Generator	Geometry	Parameters
`ico_sphere`	Subdivided icosahedron	radius, subdivisions
`petal_disk`	Radially-modulated disk	radius, petal_count, amplitude
`torus`	Standard torus	major_radius, minor_radius, segments
`cylinder`	Open cylinder	radius, height, segments

All generators produce vertex normals for smooth shading. Euler rotation and uniform scaling are applied per-shape by the grammar’s accumulated transform.

Render Backend

LuxCore’s unbiased path tracer via the pyluxcore Python API. The from-source build (thirdparty/installed/LuxCore/pyluxcore/) provides CUDA support; the PyPI wheel (CPU-only) serves as fallback.

PATHOCL — GPU-accelerated path tracing on CUDA devices. Hybrid mode automatically uses both GPU intersection and 64 CPU native threads. Single-GPU targeting via gpu_id for orchestrator-managed eviction of vLLM endpoints.
PATHCPU — 64-thread CPU rendering when no CUDA devices are available. ~10x slower than single-GPU PATHOCL.
Materials — Spectral glass with homogeneous volume absorption. LuxCore’s spectral rendering produces physically accurate caustics and internal reflections. This was the motivation for switching from Blender Cycles, which rendered recursive glass nesting as opaque white blobs.
Halt conditions — Production: 60s / 512 samples per pixel. Curation pipeline: 20s / 128 spp for throughput.

Render Variants

Variant	Dimensions	Purpose
`display`	1400x300	Card header image on site
`og`	1200x630	OpenGraph social sharing

gRPC Integration

Rendering is triggered via the /render CLI command, which invokes the RenderCards streaming RPC. The render workload sets allow_baseline_eviction=True to temporarily free a GPU from vLLM inference. After rendering completes, clear_embeddings() releases the Nomic model (~3GB) from GPU memory.

Components

Module	Purpose
`data.py`	Feature extraction from embedding geometry
`grammar.py`	Grammar Engine — recursive shape expansion
`meshgen.py`	Pure numpy mesh generators (ico_sphere, petal, torus)
`luxcore_renderer.py`	LuxCore Renderer — scene assembly and rendering
`renderer.py`	Async wrappers, variant management, thread pool
`storage.py`	R2 upload, DB updates, KV sync

LuxCore Renderer

LuxCore is the unbiased path tracer used for generating card visualizations. It provides GPU-accelerated rendering with physically accurate spectral glass materials that Blender Cycles could not achieve.

Installation

PyPI (CPU-only, production fallback):

uv pip install pyluxcore --no-deps

The --no-deps flag is required to avoid pulling in numpy 2.x, which conflicts with vLLM.

From source (GPU path):

The from-source build lives at thirdparty/src/LuxCore (git submodule). Build with:

./build-thirdparty.sh --component luxcore

Output: thirdparty/installed/LuxCore/pyluxcore/pyluxcore.cpython-312-x86_64-linux-gnu.so

Runtime libraries (OIDN + TBB) are installed to thirdparty/installed/LuxCore/lib/ with RPATH set to $ORIGIN/../lib. CUDA 12.4 at /usr/local/cuda is auto-detected during build.

Render Engines

PATHOCL – GPU-accelerated path tracing on CUDA devices. This is the primary production engine. Hybrid mode automatically uses both GPU intersection and 64 CPU native threads together. The engine name is PATHOCL, not PATHGPU (which does not exist).

PATHCPU – 64-thread CPU rendering when no CUDA devices are available. Approximately 10x slower than single-GPU PATHOCL for equivalent sample counts.

Device Selection

CUDA devices are selected via a string of 0 and 1 characters (no spaces), where each position maps to an entry in pyluxcore.GetOpenCLDeviceList():

# Device order: 6 OpenCL (indices 0-5) + 6 CUDA (indices 6-11)
# Physical GPU N = cuda_indices[N]
# Select only GPU 2:
device_string = "000000001000"  # CUDA index 8 = physical GPU 2

The gpu_id parameter restricts rendering to a single evicted GPU, which is required since all other GPUs are loaded by vLLM.

Scene Construction

Camera configuration goes in scene.Parse(), NOT in the config object. This is a common LuxCore pitfall:

scene.Parse(pyluxcore.Properties()
    .Set(pyluxcore.Property("scene.camera.type", "perspective"))
    .Set(pyluxcore.Property("scene.camera.lookat.orig", [0, -5, 2]))
    .Set(pyluxcore.Property("scene.camera.lookat.target", [0, 0, 0.5]))
    .Set(pyluxcore.Property("scene.camera.fieldofview", 40))
)

Light Types

LuxCore supports: point, spot, distant, constantinfinite. There is no area light type – use emissive meshes instead. Light gain values are approximately 100x lower than Blender energy values.

Film Pipeline

After rendering, the film pipeline must be executed with an explicit pipeline index:

session.GetFilm().ExecuteImagePipeline(0)  # 0 = pipeline index, required

Polling vs Blocking

Never use WaitForDone() – it blocks indefinitely. Use polling with HasDone() and UpdateStats():

while not session.HasDone():
    session.UpdateStats()
    stats = session.GetStats()
    elapsed = stats.Get("stats.renderengine.time").GetFloat()
    if elapsed > timeout_seconds:
        break
    time.sleep(0.5)

Initialization

pyluxcore.Init() must be called exactly once. The _ensure_luxcore() helper handles this, preferring the from-source build over the PyPI wheel:

def _ensure_luxcore():
    """Initialize LuxCore once, preferring from-source build."""
    source_path = Path("thirdparty/installed/LuxCore/pyluxcore")
    if source_path.exists():
        sys.path.insert(0, str(source_path))
    import pyluxcore
    pyluxcore.Init()

Grammar Engine

The grammar engine implements a CFDG-inspired recursive expansion system that generates unique 3D scenes from card topology features. It lives in gaius.viz.grammar and produces a flat list of positioned shapes that the LuxCore renderer assembles into spectral glass scenes.

Mathematical Grounding

The grammar’s inputs are not aesthetic parameters — they are computed from the intrinsic geometry and topology of the collection’s embedding manifold:

Ollivier-Ricci curvature (κ) on the k-NN graph over 768-dim Nomic embeddings: κ(x,y) = 1 - W₁(μₓ, μᵧ) / d(x,y), where W₁ is the 1-Wasserstein distance between neighborhood distributions (GraphRicciCurvature, k=15, alpha=0.5)
Persistent homology via Vietoris-Rips filtration (ripser, cosine distance): Betti numbers b₀, b₁, b₂ and persistence diagrams
Complexity: mean cosine distance to k-nearest neighbors, normalized across the collection
Gradient fields: ∇κ on the embedding manifold, projected to 2D via PCA — positions the key light along the direction of steepest semantic change

Design Principles

From Context Free Design Grammars (Horigan, 2004), the engine borrows three key ideas:

Weighted rule alternatives — at each expansion step, the grammar chooses among productions with probabilities derived from the card’s feature vector. This is what makes different cards produce different structures.
Recursive expansion with transform accumulation — each production can invoke sub-rules with a child transform (translation, rotation, scale) relative to the parent. Transforms compose multiplicatively, producing self-similar structures at decreasing scales.
Termination by minimum scale — expansion stops when accumulated scale drops below MIN_SCALE (0.08) or when the shape budget (MAX_SHAPES = 35) is exhausted.

Deterministic Seeding

Every card produces the same visualization regardless of when or where it is rendered:

seed = int(hashlib.sha256(card_id.encode()).hexdigest(), 16) % (2**32)
rng = random.Random(seed)

Feature-to-Rule Mapping

Card topology features control rule weights and recursion depth:

Feature	Grammar Effect
curvature (κ)	Petal count, recurse-vs-stop weight, dome factor, glass color temperature
persistence	Max depth (3-7), shell nesting weight, spiral count
complexity	Branch-vs-grow weight, surface segments, tube radius
boundary (∇·∇κ)	Emission strength, volume absorption density, core radius
b₁ (1-cycles)	Number of toroidal glass rings (0-3)
b₂ (2-cycles)	Number of void chambers (0-2) — inverted-normal spheres
diagram	Filament count, scale (encodes persistence interval lifetime), z-position (encodes birth value)
card_index	Phase offset for rotational variety within a collection

Shape Primitives

Six mesh types, all implemented as arbitrary meshes in meshgen.py (pure numpy vertex/face arrays, not geometric primitives):

Shape	Mesh Generator	Driven By
Petals	`petal_disk()` — flower-like segments	κ (count), arrangement mode
Shells	`ico_sphere()` — nested recursive enclosures	persistence (nesting depth)
Tori	`torus()` — glass rings	b₁ (1-cycles in persistent homology)
Voids	`ico_sphere()` with inverted normals	b₂ (2-cycles, cavities)
Filaments	`cylinder()` — thin structures	persistence diagram intervals
Core	`ico_sphere()` — central anchor	boundary (divergence magnitude)

Arrangement Modes

The root-level grammar selects one of three arrangement modes probabilistically based on curvature and complexity:

Cluster — radial arrangement around a center point
Spiral — logarithmic spiral placement
Branches — tree-like recursive branching

Extensibility

Adding a new shape primitive requires three changes:

A mesh generator function in meshgen.py: (parameters) -> (vertices, faces)
A shape constant in grammar.py
A renderer case in luxcore_renderer.py

The grammar and renderer are agnostic to the geometry they receive — any mesh generator that returns numpy vertex and face arrays works. This separation means the grammar architecture is fixed while the visual vocabulary grows.

Viz Storage

Rendered card visualizations are stored in Cloudflare R2 and served from a public URL. The storage layer handles upload, database updates, and KV sync for live site pages.

R2 Bucket

Property	Value
Bucket name	`gaius-viz`
Public URL	`https://viz.gaius.zndx.org`

Object Key Convention

Rendered images follow a predictable path structure:

viz/cards/{card_id}/{variant}.png

For example:

viz/cards/abc123/display.png
viz/cards/abc123/og.png

Variants

Each card is rendered in two variants:

Variant	Dimensions	Purpose
`display`	1400x300	Card header image on the site
`og`	1200x630	OpenGraph image for social sharing

Database Integration

The image_url column in the cards table stores the display variant URL:

https://viz.gaius.zndx.org/viz/cards/{card_id}/display.png

The OG variant URL is derived by path convention – replace display.png with og.png. There is no separate database column for the OG URL.

Upload Flow

After the LuxCore renderer produces an image, the storage module (gaius.viz.storage) handles:

R2 upload – uploads both display and OG variants to the bucket
DB update – sets the image_url column on the card row
KV sync – updates Cloudflare KV stores used by the live card pages

# Simplified upload path
await upload_to_r2(card_id, display_bytes, "display")
await upload_to_r2(card_id, og_bytes, "og")
await update_card_image_url(card_id, display_url)
await sync_kv(card_id)

CLI Access

# Render cards for a collection
uv run gaius-cli --cmd "/render collection-id"

# The render command handles the full pipeline:
# grammar expansion -> LuxCore render -> R2 upload -> DB update -> KV sync

GPU Eviction

Rendering requires GPU access, but vLLM typically occupies all GPUs. The render workload requests GPU eviction via allow_baseline_eviction=True in the gRPC workload metadata. After rendering completes, clear_embeddings() releases the Nomic embedding model (~3GB) from GPU memory. See Visualization for the full pipeline context.

Bases Feature Store

Bases is an entity-centric feature store backed by Apache Kudu (via PostgreSQL FDW) with a fluent query API, BFO ontology grounding, and query guardrails. It abstracts multiple storage backends behind a unified interface.

Core Concepts

A Base is a named, typed view over features and entities. Bases hide the underlying storage backend (PostgreSQL, Iceberg, Kudu FDW) behind a consistent query interface.

Three base types determine query semantics and backend routing:

Type	Semantics	Backend
`SNAPSHOT`	Latest value per entity	Kudu via FDW (PostgreSQL stub)
`HISTORICAL`	Event-sourced with time-travel	Apache Iceberg
`REGISTRY`	Metadata queries	PostgreSQL

Fluent Query API

The primary query interface uses Kudu SDK-style method chaining:

from gaius.bases import Base, col, term

results = await (
    Base("events")
    .where(col("age") > 30)
    .where(col("status").isin("active", "pending"))
    .select("name", "email")
    .order_by("created_at", desc=True)
    .limit(100)
    .scan()
)

Ontology-grounded queries resolve BFO terms to column names via the base’s @context:

results = await (
    Base("events")
    .where(term("BFO:material_entity") == "ENT-12345")
    .scan()
)

Time-travel queries on historical bases:

results = await (
    Base("events")
    .as_of("2026-01-01T00:00:00Z")
    .where(col("entity_id") == "user-42")
    .scan()
)

Base Definition (.base YAML)

Bases are defined in YAML files with JSON-LD style semantic grounding:

"@context":
  "@vocab": "https://purl.obolibrary.org/obo/"
  entity_id:
    "@id": "BFO_0000040"

kudu:
  table: "gaius.events"
  primary_key: [entity_id, event_time]

schema:
  - name: entity_id
    type: STRING
  - name: event_time
    type: TIMESTAMP

Query Guardrails

All queries pass through guardrails that enforce resource limits:

Guardrail	Default	Maximum
Result limit	1,000 rows	10,000 rows
Query timeout	30 seconds	120 seconds
Time range (historical)	7 days	90 days

Historical bases require a time constraint (.as_of() or time column filter). Unbounded historical scans are rejected.

MCP Tools

Tool	Operation
`bases_list`	List available bases with metadata
`bases_query`	Execute fluent queries against bases
`bases_entity_history`	Get event-sourced history for an entity
`bases_health`	Check service health

Architecture

Fluent API (Base/col/term) ──> Parser ──> Compiler (SQLGlot) ──> Executor
                                              |                      |
                                              v                      v
                                    Guardrail Enforcer         PostgreSQL / Iceberg

The DQL Query Language provides the text-based query syntax parsed by the fluent expression parser.

Guru Meditation Codes

Code	Meaning
`#BASES.00000001.NOPOOL`	Database pool not configured
`#BASES.00000002.NOICEBERG`	Iceberg catalog unavailable
`#FLUENT.00000001.BADAST`	Invalid query expression
`#FLUENT.00000002.UNSAFEOP`	Unsafe operation attempted

DQL Query Language

DQL (Domain Query Language) is the text-based query syntax for the Bases feature store. It provides a safe, sandboxed expression language that compiles to SQL via SQLGlot.

Syntax

DQL expressions use a fluent Python-like syntax that is parsed via AST walking (never eval):

Base("events").where(col("age") > 30).limit(10)
Base("users").where(col("status").isin("active", "pending")).select("name", "email")
Base("metrics").where(term("BFO:temporal_region") >= "2026-01-01").order_by("timestamp", desc=True)

Operators

Column References

col("name") creates a column reference for filtering and selection:

col("age") > 30
col("status") == "active"
col("name").like("John%")
col("deleted_at").is_null()
col("role").isin("admin", "editor")

Term References

term("IRI") creates an ontology-grounded reference that resolves to a column via the base’s @context:

term("BFO:material_entity") == "ENT-12345"
term("BFO:temporal_region") >= "2026-01-01"

Comparison Operators

Operator	DQL	SQL
Equal	`==`	`=`
Not equal	`!=`	`!=`
Less than	`<`	`<`
Less or equal	`<=`	`<=`
Greater than	`>`	`>`
Greater or equal	`>=`	`>=`

Logical Operators

Predicates can be combined with bitwise operators:

(col("age") > 30) & (col("status") == "active")   # AND
(col("role") == "admin") | (col("role") == "editor")  # OR
~(col("deleted_at").is_null())                       # NOT

Multiple .where() calls are combined with AND.

Methods

Method	Purpose	Example
`.where(pred)`	Add filter predicate	`.where(col("x") > 1)`
`.select(*cols)`	Select specific columns	`.select("name", "email")`
`.order_by(col, desc=)`	Sort results	`.order_by("created_at", desc=True)`
`.limit(n)`	Limit result count	`.limit(100)`
`.as_of(ts)`	Time-travel (historical)	`.as_of("2026-01-01T00:00:00Z")`
`.scan()`	Execute query	`await query.scan()`

Safety Model

DQL is parsed using Python’s ast module with strict whitelisting. Only allowed names (Base, col, term, True, False, None), methods, and operators are permitted. Any unrecognized AST node triggers a fail-fast error:

#FLUENT.00000001.BADAST - Unsupported AST node
#FLUENT.00000002.UNSAFEOP - Unsafe operation attempted

This prevents arbitrary code execution while supporting expressive queries.

Compilation

The FluentCompiler translates DQL expressions to PostgreSQL-compatible SQL using SQLGlot:

query = Base("events").where(col("age") > 30).limit(10)
sql = query.to_sql()
# SELECT * FROM events WHERE age > 30 LIMIT 10

Term references are resolved through the base’s @context dictionary, mapping ontology IRIs to physical column names.

MCP Usage

DQL queries are passed as strings to the bases_query MCP tool:

uv run gaius-cli --cmd '/bases query events where(col("age") > 30).limit(10)'

The parser validates the expression before compilation, ensuring that only safe operations reach the database.

RASE Metamodel

RASE (Rapid Agentic Systems Engineering) is a Python-native MBSE (Model-Based Systems Engineering) metamodel for verifiable agent training. It implements SysML v2-like semantics using Pydantic models, without requiring external MBSE tooling.

Core Principle: RLVR (Reinforcement Learning with Verifiable Reward)

The reward signal comes from verifiable computation, not human feedback or learned approximations. The verifier is a first-class artifact — specified, reviewed, tested, and versioned alongside the agent it trains.

Oracle invariant: The oracle uses API ground truth (system state from NiFi REST API), never UI observations. UI traces (screenshots with Set-of-Mark annotations) are the training target — what the agent learns to produce. The oracle verifies whether the system reached the desired state, not whether the UI looked correct.

Four Coupled Models

RASE consists of four tightly coupled models. Changes to one often require updates to others:

Model	Purpose	Package
SSM	System State Model — system as typed graph	`gaius.rase.domains.nifi`
OSM	Operational Scenario Model — BDD scenarios	`gaius.rase.osm`
UOM	UI Observation Model — SoM/ToM grounding	`gaius.rase.uom`
VM	Verifier Model — requirements, oracle, rewards	`gaius.rase.vm`

The TraceableId spine links artifacts across all four models via URI-based identification (bdd://features/basic_flows#Scenario:CreateFlow, nifi://root/processors/abc123). The DigitalThread captures provenance relationships between artifacts.

Constraint System

Constraints are declarative, composable, and immutable (frozen=True). They evaluate against a SystemState and return structured ConstraintResult objects with rich failure messages:

# Atomic constraints
ProcessorExists(name="Generate Data")
ProcessorHasType(name="Generate Data", type_pattern="GenerateFlowFile")

# Composition via AllOf, AnyOf, Not
AllOf([
    ProcessorExists(name="Generate Data"),
    ProcessorHasType(name="Generate Data", type_pattern="GenerateFlowFile"),
    Not(ProcessorExists(name="Obsolete Processor")),
])

The constraint system is generic over SystemState via Constraint[S] — the NiFi domain provides NiFiInstance as the concrete state type, but the composition operators (AllOf, AnyOf, Not) work with any domain registered in the DomainRegistry.

Reward Computation

Verification produces a VerdictKind (PASS, FAIL, INCONCLUSIVE, ERROR) and an accuracy score (0.0–1.0): the fraction of constraints satisfied, computed as |{c in C : pass(c)}| / |C| with uniform weighting. Two reward strategies translate this into RL training signals:

Strategy	Signal	Use Case
`BinaryReward`	0.0 or 1.0	Sparse signal — all-or-nothing. Suitable when partial credit is meaningless
`GradedReward`	Continuous, with configurable pass_reward and fail_penalty	Dense signal with partial credit based on accuracy. Suitable for multi-constraint scenarios where incremental progress matters

The compute_reward() function is the single entry point: it takes a VerificationResult and a reward strategy, returning a scalar suitable for policy gradient updates.

SysML v2 Alignment

SysML v2 Concept	RASE Implementation
`requirement def`	`Requirement`, `ScenarioRequirement`
`verification def`	`VerificationCase`, `APIVerificationCase`
`constraint def`	`Constraint` subclasses (composable via `AllOf`, `AnyOf`, `Not`)
`action def`	`StepDef` with `@given`, `@when`, `@then`
`part def`	`Processor`, `ProcessorGroup`, `NiFiInstance`
Human ID `<'scheme:path'>`	`TraceableId.uri`

Package Structure

src/gaius/rase/
├── core/                 # Domain-agnostic: SystemState, Constraint[S], Oracle[S]
├── domains/              # Domain-specific implementations
│   ├── base.py           # DomainSpec, DomainRegistry
│   └── nifi/             # NiFi domain (state, constraints, oracle)
├── traceability.py       # TraceableId, DigitalThread
├── osm/                  # Operational Scenario Model (BDD)
├── uom/                  # UI Observation Model (SoM/ToM)
└── vm/                   # Verifier Model (requirements, oracle, rewards)

Verification Discipline

All constraints are immutable (frozen=True), return structured ConstraintResult objects, and support declarative composition. The oracle consults the system API for reward computation — not the UI observation trace. This separation ensures that UI-level errors in training data do not corrupt the reward signal.

See Verification for the full reward computation pipeline, and RASE Models for detailed model documentation.

Four Coupled Models

The RASE metamodel consists of four tightly coupled models. They form a coherent verification framework where changes to one model often require updates to others.

Coupling Matrix

If you change…	Also update…
SSM (system state)	VM constraints that reference state structure
OSM (scenarios)	VM requirements derived from scenarios
UOM (marks/traces)	VM verification cases that consume traces
VM (verification)	Ensure reward strategies align with constraint semantics

SSM – System State Model

The SSM represents the system under test as a typed graph. The primary domain is NiFi, modeled as NiFiInstance containing ProcessorGroup, Processor, FlowConnection, and ControllerService nodes.

from gaius.rase.domains.nifi import NiFiInstance, Processor, ProcessorGroup

state = NiFiInstance(
    root_group=ProcessorGroup(id="root", name="NiFi Flow", processors=[
        Processor(id="abc", name="GetFile", type="org.apache.nifi.GetFile"),
    ])
)

SSM constraints are declarative, composable, and immutable. Examples: ProcessorExists, AllProcessorsRunning, NoBackpressure, FlowIsEquivalent. Compose with AllOf, AnyOf, Not.

OSM – Operational Scenario Model

The OSM captures BDD (Behavior-Driven Development) scenarios as executable specifications. Each scenario is a sequence of Given/When/Then steps that map to SysML v2 action definitions.

from gaius.rase.osm import Scenario, StepType, StepUsage

scenario = Scenario(
    name="CreateBasicFlow",
    steps=[
        StepUsage(step_type=StepType.GIVEN, text="NiFi is running"),
        StepUsage(step_type=StepType.WHEN,  text="I create a processor group named 'ETL'"),
        StepUsage(step_type=StepType.THEN,  text="the group 'ETL' exists"),
    ],
)

Step definitions (StepDef) are reusable patterns with {param} placeholders. The StepRegistry maps step text to executable actions via @given, @when, @then decorators.

UOM – UI Observation Model

The UOM provides grounding between language and UI actions using two complementary structures:

SoM (Set-of-Mark): A ScreenshotWithSoM annotates a screenshot with numbered Mark objects, each with a BoundingBox, UIRole, and optional mapping to an SSM element.
ToM (Trace-of-Mark): A TraceOfMarks records a sequence of ActionFrame entries (click, type, scroll) referencing marks by number, forming the agent’s action trajectory.

from gaius.rase.uom import Mark, BoundingBox, PixelCoord, UIRole

mark = Mark(
    mark_id=1,
    bbox=BoundingBox.from_xywh(100, 200, 50, 30),
    ui_role=UIRole.BUTTON,
    label="Add Processor",
)

The SoM/ToM pattern enables precise UI grounding: agents reference elements by mark number rather than pixel coordinates.

VM – Verifier Model

The VM implements RLVR verification. It connects OSM scenarios to executable verification cases with oracle-based reward computation. See Verification for full details.

Key components:

Requirements: StepRequirement (atomic, from a BDD step) and ScenarioRequirement (composite, grouping steps with invariants)
Verification Cases: APIVerificationCase (ground truth via API) and UIVerificationCase (agent UI actions, final state checked via API)
Oracle: NiFiOracle queries the NiFi REST API for authoritative state verification
Reward Strategies: BinaryReward (sparse) and GradedReward (partial credit)

Traceability

TraceableId and DigitalThread form the traceability spine linking all RASE artifacts. Every model element carries a URI-based identifier that enables cross-model linking, impact analysis, and full audit trails from requirement to training reward.

TraceableId

A TraceableId mirrors the SysML v2 human ID pattern: <'scheme:path'>. It is immutable (frozen=True) and hashable for use as dict keys and set members.

URI Schemes

Scheme	Namespace	Example
`bdd`	BDD features, scenarios, steps	`bdd://features/basic_flows#Scenario:CreateFlow`
`nifi`	NiFi processors, groups, connections	`nifi://groups/root/processors/abc123`
`otel`	OpenTelemetry spans and events	`otel://spans/trace123/span456`
`metaflow`	Metaflow runs, steps, tasks	`metaflow://flows/train/runs/42`
`rase`	Internal artifacts (results, threads)	`rase://verify/a1b2c3d4e5f6`
`som`	Set-of-Mark UI annotations	`som://screenshots/frame42`
`tom`	Trace-of-Mark action sequences	`tom://traces/episode7`

Factory Methods

from gaius.rase import TraceableId

# BDD scenario
tid = TraceableId.from_bdd("basic_flows", scenario="CreateFlow")
# → bdd://features/basic_flows#Scenario:CreateFlow

# NiFi processor
tid = TraceableId.from_nifi("root", processor_id="abc123")
# → nifi://groups/root/processors/abc123

# Auto-generated with UUID
tid = TraceableId.generate(scheme=IdScheme.RASE, prefix="verify")
# → rase://verify/a1b2c3d4e5f6

# Stable BDD step hash (survives line number changes)
tid = TraceableId.from_bdd_step_hash("flow.feature", "I create a group named 'ETL'")

DigitalThread

A DigitalThread captures one complete verification-to-training cycle. It links the full chain:

Requirement –> Verification Case –> Execution Result –> Evidence –> Training Episode

from gaius.rase import DigitalThread

thread = DigitalThread(
    requirement_id=req_id,
    verification_case_id=case_id,
    verification_result_id=result_id,
    api_state_before=before_id,
    api_state_after=after_id,
    reward_outcome=0.85,
)
thread.add_evidence(screenshot_id, "screenshot")
thread.add_evidence(span_id, "span")

TraceabilityGraph

The TraceabilityGraph collects TraceabilityLink objects (directed, typed relationships) and supports queries:

Forward trace: What derives from this requirement?
Backward trace: What requirements does this artifact satisfy?
Impact analysis: What verification cases need re-running if this changes?

Link types follow MBSE semantics: DERIVES, SATISFIES, VERIFIES, ALLOCATES, TRACES, REFINES.

Source

All traceability infrastructure lives in src/gaius/rase/traceability.py.

Verification

The Verifier Model (VM) implements RLVR — Reinforcement Learning with Verifiable Reward (as distinct from RLHF, which depends on learned reward models). The oracle provides ground-truth verification using authoritative API sources, never UI observations. UI traces are the training target, not the oracle.

VerdictKind

Every verification case produces one of four outcomes:

Verdict	Meaning	Default Reward
`PASS`	All constraints satisfied	1.0
`FAIL`	One or more constraints not satisfied	0.0 (or accuracy for partial credit)
`INCONCLUSIVE`	Could not determine (missing data)	0.5
`ERROR`	Verification itself failed (infrastructure)	0.0

Accuracy

Accuracy is always a float in [0.0, 1.0], computed as the proportion of constraints satisfied with uniform weighting:

accuracy = |{c ∈ C : pass(c)}| / |C|

This provides the foundation for graded reward strategies. A scenario with 8 of 10 constraints passing yields accuracy 0.8, which GradedReward can convert to a dense training signal with partial credit.

Verification Cases

Two types serve different purposes in the training pipeline:

APIVerificationCase — the RLVR oracle. Checks system state via the NiFi REST API. Evaluates Given (setup preconditions), Then (end-state assertions), invariant (must hold throughout), and transition constraints (state change patterns). This is the ground truth.

UIVerificationCase — verifies agent UI actions. The agent’s browser trace (SoM/ToM marks, click coordinates, navigation steps) is recorded. The final state is still checked via API; the trace captures the path the agent took to get there. Good traces on passing verification cases become training data.

Constraint Composition

Constraints are generic over SystemState and support algebraic composition:

constraint: Constraint[NiFiInstance] = AllOf([
    ProcessorExists(name="Generate Data"),
    ProcessorHasType(name="Generate Data", type_pattern="GenerateFlowFile"),
    Not(ProcessorExists(name="Obsolete Processor")),
])
result: ConstraintResult = constraint.evaluate(state)

AllOf, AnyOf, and Not compose arbitrarily. Each ConstraintResult carries a boolean satisfied, a human-readable message, and the constraint’s name — enabling precise identification of which constraint in a composition failed and why.

Reward Strategies

Reward strategies convert verification results into RL training signals:

Strategy	Signal Type	Use Case
`BinaryReward`	Sparse (0 or 1)	Clear pass/fail tasks, early training
`GradedReward`	Dense (0.0–1.0 with partial credit)	Multi-step tasks, complex scenarios
`StepwiseReward`	Dense per step	Long sequences where intermediate progress matters
`TrajectoryShaping`	Dense with efficiency bonus	Tasks where path quality matters

GradedReward uses the accuracy score directly: reward = accuracy * pass_reward + (1 - accuracy) * fail_penalty. The pass_bonus parameter adds a bonus only when accuracy is exactly 1.0, creating an incentive to satisfy all constraints rather than settling for partial credit.

Oracle Hierarchy

The NiFiOracle provides base verification. Specialized oracles build on it:

Oracle	Purpose
`NiFiOracle`	Base: API state → constraint evaluation → reward
`CurriculumOracle`	Progressive difficulty — easier scenarios first, harder as agent improves
`EnsembleOracle`	Multi-source consensus — cross-validates against multiple oracles
`DaemonOracle`	Used by evolution daemon — same RASE pipeline as production

The DaemonOracle is critical: it ensures evolution’s reward signal is verifiable, not learned. The same Constraint[S] composition that verifies production agents verifies evolution candidates.

Source

Verification infrastructure lives in src/gaius/rase/vm/ with verification cases in verification.py, requirements in requirements.py, and oracle/reward logic in oracle.py.

Observability

Gaius uses a three-layer observability stack: OpenTelemetry for instrumentation, Prometheus for time-series storage, and Metabase for self-service analytics dashboards.

Architecture

The observability pipeline separates emission (application code → OTel SDK → Collector → Prometheus) from consumption (Prometheus → PrometheusSource → TUI/CLI):

Application Code → OTel SDK → OTLP Exporter → OTel Collector → Prometheus
                                                                    |
                                            PrometheusSource ← PromQL queries
                                                    |
                                        ObservePanel (TUI) / CLI /observe

The engine is the single source of truth for metric export. All clients (CLI, TUI, MCP) route metrics through gRPC, which exports via OpenTelemetry to the Collector. Entry point identification tags traces with the originating service (gaius-tui, gaius-cli, gaius-mcp, gaius-engine, gaius-worker).

Components

Layer	Technology	Purpose
OpenTelemetry	OTel SDK + Collector	Distributed tracing, metric instrumentation
Prometheus	PromQL, time-series DB	Metric storage, alerting, range queries
Metabase	SQL analytics platform	Dashboards connected to PostgreSQL

MetricSource Protocol

All metric backends implement the MetricSource protocol with two operations: query() for point-in-time values and query_range() for time series. The PrometheusSource implementation translates to PromQL queries over HTTP.

Metric Definitions

The OBSERVE_METRICS registry defines declarative metric configurations:

Category	Metrics	Source
GPU	Memory used (GB), utilization (%), temperature	DCGM/pynvml via Prometheus
Inference	Latency p95 (ms), throughput (req/s), error rate	Engine OTel SDK
Health	Active incidents, escalations, FMEA scores	Health Observer
Pipeline	Cards/day, backlog depth, evolution cycles	Engine services

Each MetricDefinition specifies a PromQL query, display format (sparkline, gauge, counter, percentage), unit conversion, and warning/critical thresholds with directional logic (above or below).

ObservePanel

The TUI’s ObservePanel displays real-time metrics with 15-second refresh intervals. Sparklines show 5 minutes of history at 15-second resolution. Thresholds trigger color changes (green → yellow → red).

Design Decisions

10-minute windowed rates (Flink-inspired) survive bursty workloads like ambient reasoning cycles that generate inference spikes
Fail Open for status display: unknown metric states are surfaced for investigation, not filtered away
Emission/consumption separation: core/telemetry.py handles OTel SDK instrumentation; this module handles querying and display. They share no code.

See each sub-chapter for implementation details.

OpenTelemetry

Gaius uses the OpenTelemetry SDK for distributed tracing and metric instrumentation. The engine centralizes all OTel export through EngineMetrics, ensuring a single source of truth for operational telemetry.

Instrumentation

The EngineMetrics singleton (initialized at engine startup) creates OTel instruments:

from gaius.engine.metrics import EngineMetrics

metrics = EngineMetrics.get_instance()
metrics.record_inference(model="reasoning", latency_ms=150, tokens=500)
metrics.record_gpu_memory(gpu_id=0, used_mb=12000, total_mb=24000)
metrics.record_healing_attempt(endpoint="reasoning", tier=0, success=True)

Metric Categories

Category	Instruments	Type
Inference	`inference_count`, `inference_latency`, `inference_tokens`	Counter, Histogram
GPU	`gpu_memory_used`, `gpu_utilization`, `gpu_flops_utilization`	Gauge (observable callbacks)
Endpoints	`endpoint_healthy`, `endpoint_requests`	Gauge, Counter
Healing	`healing_attempts`, `healing_escalations`, `incidents_active`	Counter, Gauge
Pipeline	`pipeline_cards_published`, `pipeline_pending_cards`	Counter, Gauge
Errors	`error_total`, `exception_caught_total`	Counter

Metric Naming

Metrics follow a double-prefix convention due to OTel Collector namespace configuration:

gaius_gaius_<metric_name>_<unit>

The first gaius_ comes from the OTel Collector namespace config; the second from SDK metric naming (gaius. becomes gaius_ after export). PromQL queries in the OBSERVE_METRICS registry use this full prefix.

Export Pipeline

EngineMetrics --> OTel SDK --> OTLP Exporter --> OTel Collector --> Prometheus

The OTel Collector runs as a sidecar, receiving OTLP and remoting to Prometheus via the Prometheus remote-write or scrape endpoint. GPU metrics use observable callbacks that are invoked on each collection cycle.

Makespan Tracing

For long-running operations (evolution cycles, research flows), Gaius uses makespan tracing: a parent span covers the entire operation, with child spans for each phase. This enables latency attribution across multi-step workflows without excessive span cardinality.

Source

Engine metrics: src/gaius/engine/metrics.py. Observability sources: src/gaius/observability/sources/.

Prometheus

Prometheus provides time-series metric storage and PromQL queries for the Gaius observability stack. It scrapes metrics exported by the OTel Collector and serves as the backend for the TUI’s ObservePanel.

PrometheusSource

The PrometheusSource client (src/gaius/observability/sources/prometheus.py) queries the Prometheus HTTP API:

from gaius.observability import PrometheusSource

source = PrometheusSource(base_url="http://localhost:9090")

# Instant query (current value)
value = await source.query_instant(
    'histogram_quantile(0.95, sum by (le) (rate(gaius_gaius_inference_latency_milliseconds_bucket[10m])))'
)

# Range query (sparkline data)
series = await source.query_range(
    'sum(rate(gaius_gaius_inference_count_total[10m])) * 3600',
    duration_seconds=300,  # 5 minutes of history
    step_seconds=15,       # 15-second resolution
)

Custom Metrics

Inference

gaius_gaius_inference_latency_milliseconds – histogram with p95 via histogram_quantile
gaius_gaius_inference_count_total – counter, displayed as inferences/hour
gaius_gaius_inference_tokens_total – counter, displayed as tokens/hour
gaius_gaius_error_total / gaius_gaius_request_total – error rate percentage

GPU

gaius_gaius_gpu_flops_utilization_percent – FLOPS-weighted utilization across 6x RTX 4090s using Welford streaming mean

Health and Self-Healing

gaius_gaius_incidents_active – gauge of active incidents
gaius_gaius_healing_escalations_total – counter of ACP escalations per hour
gaius_gaius_fmea_rpn_score – FMEA Risk Priority Numbers (high RPN > 200)

Pipeline Operations

gaius_gaius_pipeline_cards_published_total – cards published (daily)
gaius_gaius_pipeline_pending_cards – backlog gauge
gaius_gaius_pipeline_task_failure_total – failures by task type (zero tolerance)
gaius_gaius_exception_caught_total – operational errors (non-LLM)

Windowed Rates

All rate calculations use 10-minute windows to survive bursty workloads. This keeps metrics hydrated during quiet periods rather than dropping to zero between bursts.

Engine Source

For metrics not available in Prometheus (GPU memory per device, scheduler queue depth, evolution cycles), the EngineSource queries the gRPC engine directly. These return single-point values since the engine does not retain history.

Source

src/gaius/observability/sources/prometheus.py, src/gaius/observability/sources/engine.py, src/gaius/observability/metrics.py.

Metabase

Metabase provides self-service analytics dashboards connected to the Gaius PostgreSQL database (zndx_gaius on port 5444). It queries the meta schema, which contains materialized analytics tables designed for dashboard consumption.

Architecture

PostgreSQL (zndx_gaius)
  ├── public schema    --> operational tables (cards, agents, health)
  ├── meta schema      --> analytics views for Metabase
  ├── collections      --> curated content for landing page
  └── bases schema     --> feature store registry
         |
    Metabase (localhost:3000)
         |
    Dashboards: lineage, operations, KB geometry

Meta Schema

The meta schema (db/migrations/20251218000001_meta_schema.sql) provides pre-aggregated analytics:

Table	Purpose
`meta.dataset_catalog`	Deduplicated dataset registry from lineage events
`meta.job_catalog`	Job registry with run counts, success/failure rates

These tables are populated from OpenLineage events and provide the foundation for data lineage dashboards.

Dashboard Categories

Lineage

Data provenance graph (which flows produce which datasets)
Dataset read/write frequency
Job success rates over time

Operations

Agent evaluation scores and evolution trends
GPU utilization over time
Inference throughput and latency distributions
Pipeline health (cards published, curation cadence)

KB Geometry

Document cluster topology
Embedding space coverage
Content freshness by domain

Process Management

Metabase runs as a devenv process defined in scripts/processes/metabase.sh. It starts on localhost:3000 and connects to PostgreSQL using the same credentials as the application (gaius:gaius@localhost:5444/zndx_gaius).

Source

Metabase process: scripts/processes/metabase.sh. Meta schema: db/migrations/20251218000001_meta_schema.sql.

Security

Gaius employs a multi-layer security model focused on protecting autonomous operations. Security verification is mandatory and cannot be disabled — this is by design to prevent generated code from bypassing security checks.

Threat Model

The primary attack surface is the ACP (Agent Client Protocol) integration, which allows autonomous health maintenance via GitHub issue workflows. Without controls, an agent could:

Leak internal state to public repositories
Be influenced by prompt injection in externally-controlled issues
Expose credentials in issue comments
Be tricked by repository visibility changes

Four Security Layers

All four layers execute on every GitHub operation. There is no parameter or configuration to skip layers — security is structural, not optional.

Layer	Check	Purpose	Guru Code on Failure
0	Format validation	Reject malformed repository names	`#ACP.SEC.00000005.BADFORMAT`
1	HOCON allowlist	Explicit repository patterns only	`#ACP.SEC.00000002.NOTALLOWED`
2	Visibility verification	Repository must be private (via `gh api`)	`#ACP.SEC.00000003.NOTPRIVATE`
3	Content sanitization	Redact secrets, strip injection markers	N/A (sanitizes, doesn’t reject)

Layer 2 re-verifies repository visibility on each operation (configurable cache TTL of 5 minutes). This protects against visibility change attacks where a repository is made public after initial validation.

Cadence Controls

To prevent runaway automation:

Maximum 3 GitHub issues per 24 hours
Minimum 5 minutes between restart attempts
Maximum 3 restarts per endpoint per hour
Cooldown per incident fingerprint (prevents repeated escalation)
All changes committed to acp/health-fix branch for human review

Content Sanitization

Before any content is included in GitHub issues, sanitize_issue_content() automatically redacts:

API keys: Anthropic (sk-ant-), OpenAI (sk-proj-), AWS (AKIA)
GitHub tokens: PAT (ghp_), OAuth (gho_), App (ghs_), Refresh (ghr_)
Bearer tokens: Bearer <token> → Bearer [REDACTED_BEARER]
Generic secrets: api_key=, token=, password=, secret=
Prompt injection markers: <|system|>, IGNORE PREVIOUS INSTRUCTIONS, JAILBREAK

Pattern order matters — specific patterns are matched before generic ones to ensure correct replacement labels.

Design Principle

Security verification is mandatory because the ACP agent generates code. If security were an option (fail_fast=True), generated code could set it to False. Making it structural — mandatory, with no bypass parameter — ensures that even compromised agent output cannot disable the security layer.

See ACP Security Model for implementation details and Content Sanitization for redaction rules.

ACP Security Model

The Agent Client Protocol uses four mandatory security layers for all GitHub operations. Every layer must pass; there is no bypass mechanism.

Layer 0: Format Validation

Repository names are validated against strict regex patterns before any network call:

# Supported formats:
"owner/repo"                  # Legacy (github.com assumed)
"github.com/owner/repo"      # Full URL
"github.example.com/org/repo" # On-prem GitHub Enterprise

Invalid characters, missing components, or malformed URLs raise GitHubSecurityError immediately with #ACP.SEC.00000005.BADFORMAT.

Layer 1: HOCON Allowlist

Repositories must be explicitly listed in ~/.config/gaius/acp.conf:

acp {
  github {
    allowed_repos = ["zndx/gaius-acp"]
    require_private = true
    verify_on_each_operation = true
    cache_visibility_seconds = 300
  }
}

Glob patterns are supported: "zndx/*" allows any repo under the zndx org. An empty allowlist means no repositories are allowed (#ACP.SEC.00000004.NOTCONFIGURED).

Layer 2: Visibility Verification

The GitHubSecurityGuard verifies repository visibility via gh api repos/{owner}/{repo} --jq .visibility. Only "private" passes; "public" and "internal" are rejected with #ACP.SEC.00000003.NOTPRIVATE.

Visibility is cached for 5 minutes (configurable via cache_visibility_seconds) and re-verified on each operation when verify_on_each_operation = true.

Layer 3: Content Sanitization

Before including any content in GitHub issues, sanitize_issue_content() redacts secrets and strips prompt injection markers. See Content Sanitization for details.

Issue titles must start with [HEALTH-FIX] prefix and are limited to 200 characters.

Attack Vectors Mitigated

Attack	Mitigation
Info leak via public repo	Layer 2: visibility verification on every operation
Prompt injection from issues	Layer 1: explicit allowlist prevents attacker-controlled repos
Credential exposure in issues	Layer 3: automatic secret redaction
Visibility change attack	Re-verify on each operation (cache TTL 5 min)
Generated code bypass	Security is mandatory – no parameter to disable

Usage

from gaius.acp.security import GitHubSecurityGuard

guard = GitHubSecurityGuard.from_config()
await guard.verify_repo("zndx/gaius-acp")  # Raises on failure

Source

src/gaius/acp/security.py

Content Sanitization

Before any content is included in GitHub issues (via ACP escalation), the sanitize_issue_content() function automatically redacts secrets and strips prompt injection markers.

Secret Patterns

The following patterns are detected and replaced with [REDACTED_*] tags:

Pattern	Example	Replacement
Anthropic API keys	`sk-ant-api03-...`	`[REDACTED_ANTHROPIC_KEY]`
OpenAI keys	`sk-proj-...`, `sk-...`	`[REDACTED_OPENAI_KEY]`
GitHub PAT	`ghp_...`	`[REDACTED_GH_PAT]`
GitHub OAuth	`gho_...`	`[REDACTED_GH_OAUTH]`
GitHub App	`ghs_...`	`[REDACTED_GH_APP]`
GitHub Refresh	`ghr_...`	`[REDACTED_GH_REFRESH]`
AWS Access Key	`AKIA...` (20 chars)	`[REDACTED_AWS_KEY]`
Bearer tokens	`Bearer <token>`	`Bearer [REDACTED_BEARER]`
Generic secrets	`api_key=`, `token=`, `password=`, `secret=`	`[REDACTED]`

Pattern order matters: specific patterns (e.g., sk-ant-) are matched before generic ones (e.g., sk-) to ensure correct replacement labels.

Prompt Injection Markers

The following injection patterns are replaced with [SANITIZED]:

LLM role markers: <|system|>, <|user|>, <|assistant|>, [INST], <<SYS>>
Override attempts: IGNORE PREVIOUS INSTRUCTIONS, SYSTEM OVERRIDE:, ADMIN MODE:
Known bypass patterns: JAILBREAK, DAN MODE, DEVELOPER MODE:

All matching is case-insensitive.

Usage

from gaius.acp.security import sanitize_issue_content

raw = "Error with key sk-ant-api03-abc123... calling endpoint"
safe = sanitize_issue_content(raw)
# "Error with key [REDACTED_ANTHROPIC_KEY] calling endpoint"

Issue Title Validation

Issue titles are validated separately via validate_issue_title():

Must start with [HEALTH-FIX] prefix
Truncated to 200 characters
Control characters stripped

Source

src/gaius/acp/security.py (the sanitize_issue_content and validate_issue_title functions).

Database

Gaius uses PostgreSQL on port 5444 with database name zndx_gaius (not gaius).

Connection

Parameter	Value
Host	`localhost`
Port	`5444`
Database	`zndx_gaius`
User	`gaius`
Password	`gaius`
URL	`postgres://gaius:gaius@localhost:5444/zndx_gaius?sslmode=disable`

Programmatic Access

Always use the centralized config function – never hardcode connection parameters:

from gaius.core.config import get_database_url

url = get_database_url()  # Single source of truth

Delegates exist in storage/database.py, storage/grid_state.py, inference/routing_analytics.py, and storage/profile_ops.py – all call through to gaius.core.config.get_database_url().

CLI Access

PGPASSWORD=gaius psql -h localhost -p 5444 -U gaius -d zndx_gaius

Connection Pooling

The storage/database.py module manages a global asyncpg connection pool (min 1, max 10 connections) via get_pool():

from gaius.storage.database import get_pool

pool = await get_pool()
async with pool.acquire() as conn:
    rows = await conn.fetch("SELECT ...")

Schemas

The database uses four schemas to organize data. See Schema Design for details.

Schema	Purpose
`public`	Core operational tables (cards, agents, evolution, health)
`meta`	Analytics views for Metabase dashboards
`collections`	Curated content for the public landing page
`bases`	Feature store registry and Iceberg catalog

Extensions

Extension	Purpose
`pg_cron`	Scheduled maintenance
`age` (Apache AGE)	Graph queries for lineage
`citext`	Case-insensitive text columns

Migrations

Schema migrations live in db/migrations/ and are ordered by timestamp prefix (e.g., 20251130000001_initial_schema.sql). The full schema dump is at db/schema.sql.

Schema Design

The PostgreSQL database (zndx_gaius) uses four schemas to organize data by domain.

Public Schema

The default public schema holds core operational tables:

Content Pipeline

feed_sources – RSS/API feed configurations with fetch intervals
fetch_jobs – Scheduled and completed fetch job records
content_items – Raw content items with KB path references
articles – Curated articles with frontmatter (keywords, news queries)

Agent System

agent_evaluations – Evaluation scores by agent and evaluator (local/xai)
evolution_cycles – Training cycle records (success, improvement, duration)
agent_versions – Version history for agent configurations
held_out_queries – Reserved evaluation queries not used in training
routing_decisions – Inference routing analytics (fallback/mismatch tracking)

Health and Observability

health_incidents – HealthObserver incident records with FMEA scores
healing_events – Self-healing attempt logs (tier, success, duration)
fmea_catalog – Failure Mode and Effects Analysis registry
scheduler_jobs – Async job queue for the inference scheduler

State

grid_state – Persisted 19x19 grid positions and overlays
cognition_memory – Self-observation and thought chain storage
research_state / research_progress – Active research thread tracking

Meta Schema

The meta schema provides materialized analytics tables for Metabase:

meta.dataset_catalog – Deduplicated dataset registry from lineage events
meta.job_catalog – Job registry with run/success/failure counts

Populated from OpenLineage events for data provenance dashboards.

Collections Schema

The collections schema manages curated content for the public landing page:

collections.collections – Named collections with featured flags
collections.collection_cards – Cards assigned to collections with ordering
collections.card_summaries – Generated card summary text

Bases Schema

The bases schema implements a feature store registry:

bases.bases – Feature store definitions (type: feature_group, model, dataset)
bases.base_versions – Versioned snapshots with Iceberg table references
bases.entity_history – Entity-level change tracking

Graph Extension

Apache AGE (ag_catalog schema) provides graph query capabilities for lineage traversal using Cypher syntax. The lineage graph connects datasets to jobs via read/write edges.

Source

Full schema dump: db/schema.sql. Migrations: db/migrations/.

pg_cron Jobs

Gaius uses the pg_cron extension for scheduled database maintenance. Jobs are defined in SQL migrations and run inside PostgreSQL without external schedulers.

Core Jobs

Job	Schedule	Purpose
`check-due-fetches`	Every 15 min	Check `feed_sources` for overdue fetches and create `fetch_jobs` records
`cleanup-fetch-jobs`	Sunday 3 AM	Remove old fetch job records (keep last 100 per source)
`archive-stale-content`	1st of month, 4 AM	Mark content items older than 90 days as archived

How It Works

The schedule_due_fetches() function checks each active feed_source against its configured fetch_interval_minutes. When a source is due, it creates a fetch_jobs record with status = 'scheduled'. Python workers poll this table and execute the actual fetch.

-- Example: schedule a fetch for a specific source
SELECT schedule_fetch('arxiv-cs-ai');

-- Check all due sources
SELECT * FROM schedule_due_fetches();

Additional Scheduled Tasks

Beyond the core jobs, several migrations add domain-specific cron schedules:

Migration	Job	Schedule
`20251214000001_evolution_periodic_tasks`	Evolution cycle triggers	Periodic
`20251223000001_theta_consolidation_cron`	Theta memory consolidation	Periodic
`20251228000002_triage_cron_jobs`	Content triage	Periodic
`20260202200000_landing_page_cron`	Landing page card publishing	Periodic
`20260203100000_scheduled_task_notify`	`NOTIFY` on scheduled task changes	Event-driven

The scheduled_task_notify migration uses PostgreSQL LISTEN/NOTIFY to wake the engine watchdog when tasks are due, avoiding polling overhead.

Monitoring

The v_source_status view provides at-a-glance health for all feed sources:

SELECT name, status, total_items, pending_jobs FROM v_source_status;

Status values: ok, overdue, never (never fetched).

Source

Core jobs: db/migrations/20251130000003_pg_cron_jobs.sql. Additional schedules are spread across domain-specific migrations in db/migrations/.

Getting Started

Gaius is a CLI-first terminal interface for navigating complex, graph-oriented data domains. It renders high-dimensional embeddings and topological structures onto a constrained 19x19 grid, transforming abstract complexity into spatial intuition.

There are three ways to interact with Gaius:

TUI – a full terminal interface with grid, panels, and keyboard navigation (uv run gaius)
CLI – a non-interactive command runner for scripting and automation (uv run gaius-cli)
MCP – 163 tools exposed to Claude Code and other MCP-compatible clients (uv run gaius-mcp)

Quick Path

If you already have devenv and Nix installed, you can be running in under a minute:

cd gaius
devenv shell
uv sync
devenv processes up -d
uv run gaius

This starts the platform services (PostgreSQL, Qdrant, gRPC engine, NiFi) and launches the TUI.

What You See

The initial screen shows a 19x19 grid with a cursor (✛) at the center position (K10). Star points (hoshi) mark the standard Go board reference positions. If knowledge base content has been indexed, entity positions appear as stones projected from the 768-dimensional Nomic embedding space via UMAP.

Try these first interactions:

Key	Action
`hjkl`	Move the cursor – watch the MiniGrids update with local context
`o`	Cycle overlays: topology → geometry → dynamics → agents
`v`	Cycle view modes: Go → Theta → Swarm
`/health`	Check system health (in the command bar)
`?`	Show the full key binding reference

The three 9x9 MiniGrids below the main board show orthographic projections centered on your cursor: an embedding neighborhood view, a scalar field elevation map, and a temporal evolution view.

Three Interfaces, One Engine

All three interfaces communicate with the same gRPC engine on port 50051. A /health command run from the CLI produces the same result as the health_observer_status MCP tool or pressing / and typing health in the TUI. Choose the interface that fits your context: TUI for exploration, CLI for automation, MCP for AI-assisted workflows.

Installation

Gaius uses devenv (built on Nix) for reproducible development environments and uv for Python dependency management.

Prerequisites

Dependency	Purpose	Install
Nix	Package manager	nix.dev
devenv	Development environment	`nix profile install github:cachix/devenv`
uv	Python package manager	Provided by devenv
Just	Task runner	Provided by devenv

You do not need to install Python, PostgreSQL, or any other runtime dependency manually. Nix provides everything.

Environment Setup

Clone the repository and enter the devenv shell:

git clone <repo-url>
cd gaius
devenv shell

The first devenv shell invocation downloads and caches all Nix dependencies. Subsequent invocations start in under a second.

Inside the shell, install Python dependencies:

uv sync

For optional features, use extras:

uv sync --extra tda      # Topological data analysis (giotto-tda)
uv sync --extra swarm    # Multi-agent support (langchain)

Starting Platform Services

Gaius depends on several backend services: PostgreSQL, Qdrant, the gRPC engine, and others. Start them all with:

devenv processes up -d

To stop all services:

devenv processes down

To verify everything is running, use the Just task runner:

just --list              # Show all available tasks
just restart-clean       # Full clean restart if something is stuck

Database

PostgreSQL runs on port 5444 with a database named zndx_gaius:

PGPASSWORD=gaius psql -h localhost -p 5444 -U gaius -d zndx_gaius

The database name is zndx_gaius, not gaius. The connection URL used internally is:

postgres://gaius:gaius@localhost:5444/zndx_gaius?sslmode=disable

Verifying the Installation

Once services are running, confirm the gRPC engine is healthy:

uv run gaius-cli --cmd "/health" --format json

If this returns a JSON health report, the installation is complete. If it fails, try just restart-clean and check the process logs in .devenv/processes.log.

First Launch

This page describes what you will see when you first start Gaius, and what to try immediately.

Starting the TUI

From inside a devenv shell with services running:

uv run gaius

The terminal fills with the Gaius interface. At its center is a 19x19 grid – the MainGrid – with a cursor marker at position K10.

What You See

The default layout has three regions:

Left panel – a FileTree showing the knowledge base as a directory structure
Center – the 19x19 MainGrid with three 9x9 MiniGrid projections below it
Right panel – a ContentPanel that shows context for the current selection

The bottom of the screen has a command bar. The cursor appears as a distinct marker on the grid.

First Steps

Move the cursor. Press h, j, k, l to move left, down, up, right. The cursor moves across the grid. The MiniGrid projections and ContentPanel update to reflect the new position.

Check your bearings. Press ? to display help in the ContentPanel. This shows the available key bindings and a summary of the current state.

Cycle the view. Press v to switch between view modes (Go, Theta, Swarm). Each mode renders the grid data differently.

Cycle overlays. Press o to layer additional information onto the grid: topology, geometry, dynamics, or agent positions.

Toggle panels. Press [ to toggle the left panel, ] to toggle the right panel, or \ to toggle both. Hiding panels maximizes grid space.

Enter a command. Press / to focus the command bar, then type health and press Enter. This runs the health diagnostic and displays system status in the ContentPanel.

First CLI Check

Open a second terminal (also in devenv shell) and try:

uv run gaius-cli --cmd "/health" --format json

This runs the same health check non-interactively and prints JSON output. The CLI and TUI connect to the same engine, so results are identical.

If Something Looks Wrong

If the grid is empty or services are not responding:

just restart-clean

This performs a full clean restart of all platform services. After it completes, relaunch with uv run gaius.

Next Steps

The TUI – understand the five components of the interface
Navigation – learn cursor movement and workflow patterns
Key Bindings – complete keyboard reference

The TUI

Gaius renders a full terminal interface built on the Textual framework. The interface draws inspiration from Bloomberg Terminal (information density), Plan 9’s Acme (everything is a file), and CAD orthographic views (multiple synchronized projections).

Launch the TUI with:

uv run gaius

Five Components

The interface is composed of five primary widgets:

MainGrid

The 19x19 grid occupies the center of the screen. It is the primary workspace – a spatial representation of high-dimensional data projected onto a Go board layout via UMAP (cosine metric, k=15, min_dist=0.1). Grid positions correspond to embedded data points, and the cursor (✛) indicates your current focus.

Three view modes (cycled with v):

Mode	What it shows
Go	Entity positions as stones on a standard Go board with star points (hoshi)
Theta	Information density view – temporal knowledge consolidation state
Swarm	Agent positions and cluster dynamics during multi-agent analysis

Four overlay modes (cycled with o) layer additional information:

Overlay	What it reveals
Topology	H0/H1/H2 persistent homology features – death loops (`⚠`) mark persistent 1-cycles
Geometry	Ollivier-Ricci curvature heatmap – positive (cluster interior) vs. negative (boundary)
Dynamics	Gradient vector field showing direction of semantic change
Agents	Agent positions colored by role (Leader, Risk, Optimizer, etc.)

MiniGridPanel

Three 9x9 orthographic projections update as you move the cursor:

View	Content	Visualization
Embed	Local cosine-similarity neighborhood around the cursor	Density shading (`█▓▒░·`)
Iso	Scalar field rendered as elevation map via IDW interpolation (power=2)	Height-mapped shading
Temporal	Time-series evolution of the cursor’s neighborhood	Change over time

The Iso view cycles through four scalar fields (curvature κ, total persistence π, complexity σ, boundary participation β) each revealing different aspects of the local topological structure.

FileTree (Left Panel)

The left panel presents a Plan 9-inspired file tree where knowledge base entries, agents, and system state are navigated as a directory structure. Agents appear as files under /agents/, and KB entries are organized by domain. Toggle visibility with [.

ContentPanel (Right Panel)

The right panel displays detailed content for the currently selected item: file contents, agent output, position context, health reports, or command results. It is the primary output area for slash commands. Toggle visibility with ].

CommandInput (Bottom Bar)

The bottom command bar accepts slash commands. Press / to focus it, type a command (e.g., health, evolve status, gpu status), and press Enter. Press Escape to cancel. The command bar supports history navigation with up/down arrows and tab completion.

Layout Flexibility

Toggle panels to adjust the layout to your task:

Full layout: all panels visible – maximum context
Grid-focused: press \ to hide both panels – maximum grid space
Research mode: hide left panel with [ – more room for content output
Navigation mode: hide right panel with ] – focus on the file tree and grid

Center Auxiliary Panel

The center panel cycles through five modes:

Mode	Content
Graph	Wiki-link graph visualization of KB connections
Think	Reasoning traces and agent thinking during inference
Evolution	Evolution daemon monitoring (agent improvement cycles)
Observe	Operational health metrics from Prometheus and engine
None	Hidden – maximizes main grid space

Density Shading

Scalar values map to block characters for high-bandwidth visual perception:

Symbol	Value range	Meaning
`█`	> 0.8	High intensity
`▓`	0.6 – 0.8	Medium-high
`▒`	0.4 – 0.6	Medium
`░`	0.2 – 0.4	Low
`·`	0.05 – 0.2	Minimal

This vocabulary is consistent across the MainGrid, MiniGrids, and overlays.

Design Principles

The TUI is keyboard-first. Every action is reachable without a mouse. Information density is high by design – the interface shows as much relevant data as possible without requiring navigation to separate screens. Modes and overlays let you shift perspective without losing your place.

The visual vocabulary is consistent: the same shading characters mean the same intensities everywhere. Death loops (⚠) always indicate persistent H1 features. Agent colors are stable across sessions.

Gaius draws inspiration from modal editors like Vim and compositional systems like Plan 9’s Acme. Navigation is keyboard-driven, modes provide context, and every operation is reversible.

Gaius uses modes to provide context-sensitive behavior. This is not complexity – it is power through focus.

Normal Mode (default): navigate, observe, toggle views
Command Mode: enter slash commands via the command bar (/)

The cursor is your focus point on the grid. It determines what position commands act upon, the center of local context, and the reference point for the MiniGrid projections.

Basic Movement

       k
       |
   h --+-- l
       |
       j

Vim-style navigation: h/j/k/l for left/down/up/right. These keys sit on the home row so your fingers never leave typing position.

Tenuki

Press t to jump to the point of highest strategic interest – a concept borrowed from Go, where tenuki means “playing elsewhere.” The engine evaluates all grid positions and moves your cursor to the most strategically relevant one.

View Modes

Press v to cycle through visualization modes:

Go Mode

Traditional Go stones on intersections. Black and white stones mark occupied positions. Empty intersections show as dots.

Theta Mode

Information density visualization named after theta waves, which facilitate memory consolidation. This mode renders allocation intensity and data density across the grid.

Swarm Mode

Agent-centric view showing multi-agent positions and activity across the grid.

Overlay Modes

Press o to cycle overlays. Overlays add visual information on top of the current view mode without changing the base rendering:

Overlay	Key concept	What it shows
None	Clean slate	Base grid only
Topology	Persistent homology	H0/H1/H2 features (components, loops, voids)
Geometry	Curvature	Semantic boundaries vs. interiors
Dynamics	Gradient field	Direction of semantic change, divergence
Agents	Team state	Agent positions on the grid

See Overlays for detailed interpretation guidance.

Iso View Modes

Press i to cycle through Iso view modes, which change the interpretation of the MiniGrid projections below the main grid. These provide different mathematical lenses on the same data.

Panel Management

Key	Action
`[`	Toggle left panel (FileTree)
`]`	Toggle right panel (ContentPanel)
`\`	Toggle both panels simultaneously

Hide panels to maximize grid visibility. Restore them to review details and navigate the knowledge base.

Graph View

Press g to cycle the center panel between modes. This toggles between the standard grid view and a graph/wiki-link visualization, providing different perspectives on the same underlying data.

Flow Patterns

Exploration Flow

Navigate with hjkl to survey the grid
Cycle overlays (o) to see different data layers
Toggle candidates (c) to see suggested positions
Press t for tenuki to jump to high-interest points

Analysis Flow

Press / to enter command mode
Run /health to check system state
Use overlays to compare topology, geometry, and dynamics
Review output in the ContentPanel

Focused Flow

Hide panels (\) for maximum grid space
Navigate to a region of interest
Switch overlays to study different dimensions
Restore panels when you need detailed context

Panels

Gaius has two side panels flanking the central grid: the FileTree on the left and the ContentPanel on the right. Both can be toggled independently or together.

Toggle Controls

Key	Action
`[`	Toggle left panel (FileTree)
`]`	Toggle right panel (ContentPanel)
`\`	Toggle both panels simultaneously

When a panel is hidden, the grid expands to fill the available space.

Left Panel: FileTree

The FileTree presents a Plan 9-inspired hierarchical view of the system. Everything is navigable as if it were a filesystem:

/
  agents/
    cognition/
    evolution/
    health/
  kb/
    current/
      projects/
      content/
    scratch/
  state/

Agents are represented as files under /agents/. Knowledge base entries appear under /kb/. System state is exposed under /state/. This design follows the Plan 9 philosophy where everything – processes, data, system state – is accessible through a uniform file interface.

Selecting an entry in the FileTree updates the ContentPanel on the right to show its contents.

Right Panel: ContentPanel

The ContentPanel is the primary output area. It displays:

File contents – when a FileTree entry is selected
Command output – results from slash commands (e.g., /health, /gpu status)
Position context – information about the current grid position
Help – key binding reference when ? is pressed
Agent output – responses from agent operations

The ContentPanel renders markdown-formatted text, tables, and structured data. It scrolls vertically for long output.

Layout Strategies

Different tasks benefit from different panel configurations:

Full context (default): both panels visible. Use when you need to navigate the knowledge base and see detailed output simultaneously.

Grid focus: press \ to hide both panels. Use when studying spatial patterns, overlay composition, or doing pure grid exploration.

Research mode: hide the left panel with [. The grid and ContentPanel share the screen, giving more room for command output and detailed content.

Navigation mode: hide the right panel with ]. The FileTree and grid share the screen, useful when browsing the knowledge base structure without needing detailed content.

Panel Persistence

Panel visibility state persists during your session. If you hide a panel and run a command, the panel stays hidden. Toggle it back when you need it.

Overlays & Visualization

Overlays are Gaius’s mechanism for layering multiple data dimensions onto a single grid. Understanding overlay composition is key to effective visual analysis.

Overlay Philosophy

A grid has 361 cells. Naively, that is one data point per cell. But complex domains have many dimensions. Overlays solve this by:

Layering: multiple data types occupy the same space
Cycling: focus shifts between layers via the o key
Compositing: some layers blend (e.g., density + markers)

Available Overlays

Press o to cycle through overlay modes. The current set is based on differential geometry concepts:

None

The cleanest view. Shows only:

Base grid (view-mode-specific symbols)
Cursor position
Candidate markers (a-i) if toggled with c

Use this for uncluttered observation of the base state.

Topology

Displays persistent homology features at three scales:

H0: connected components – clusters of related data points
H1: loops – cycles in the embedding space (feedback loops, circular dependencies)
H2: voids – higher-dimensional cavities (structural gaps)

Topological features that persist across scales are significant. Transient features are noise. The overlay highlights those that survive, revealing the true shape of the data.

Geometry

Curvature heatmap showing semantic boundaries versus interiors. High curvature regions mark transitions between conceptual domains. Low curvature indicates the interior of a coherent cluster. This overlay helps identify where one topic ends and another begins.

Dynamics

Gradient vector field showing the direction and magnitude of semantic change. Arrows or indicators point toward regions of increasing density or relevance. Divergence patterns reveal sources (generating new content) and sinks (absorbing attention). This overlay captures how the data landscape is evolving.

Agents

Agent positions projected from embedding space onto the grid. Each active agent occupies a position determined by its current focus within the data. Watch for:

Clustering: agents in agreement, converging on the same region
Scattering: genuine uncertainty or broad exploration
Opposition: agents on opposite sides of the grid (tension, disagreement)
Isolation: a single agent in a region (unique insight worth investigating)

Reading Composite Views

When multiple features occupy a cell, priority determines display:

Overlay markers – highest priority
Candidate letters (a-i)
Cursor
Stones/density (view-mode symbols)
Empty (dot) – lowest priority

Overlay as Situational Awareness

Each overlay provides a different “sense”:

None: clean visual baseline
Topology: structural awareness (what shapes exist)
Geometry: boundary awareness (where things change)
Dynamics: momentum awareness (where things are going)
Agents: team state awareness (where agents are looking)

Cycling overlays is like shifting attention between modalities – a form of augmented situational awareness. The OODA loop pattern (Observe, Orient, Decide, Act) maps naturally: observe with None, orient with Topology or Geometry, decide based on Dynamics, act on Agent positions.

Combining with View Modes

Overlays compose with view modes (v). A Topology overlay on Go mode shows homology features atop stone positions. The same overlay on Theta mode shows features atop density shading. Experiment with combinations to find the perspective that reveals what you need.

Key Bindings

Complete reference for all keyboard shortcuts in the Gaius TUI.

Key	Action	Description
`h`	Move left	Move cursor one position left
`j`	Move down	Move cursor one position down
`k`	Move up	Move cursor one position up
`l`	Move right	Move cursor one position right
`t`	Tenuki	Jump to point of highest strategic interest

View Controls

Key	Action	Description
`v`	Cycle view	Cycle through view modes: Go, Theta, Swarm
`o`	Cycle overlay	Cycle through overlays: None, Topology, Geometry, Dynamics, Agents
`i`	Cycle iso	Cycle Iso view modes for MiniGrid projections
`c`	Toggle candidates	Show/hide candidate markers (a-i) at suggested positions

Panel Controls

Key	Action	Description
`[`	Toggle left	Show/hide the FileTree panel
`]`	Toggle right	Show/hide the ContentPanel
`\`	Toggle both	Show/hide both panels simultaneously

Commands and Help

Key	Action	Description
`/`	Command mode	Focus the command bar to enter a slash command
`?`	Help	Display help and key reference in the ContentPanel

Graph and Evolution

Key	Action	Description
`g`	Graph	Cycle center panel between grid and graph views
`e`	Evolution	Show evolution panel directly

Notes

Key	Action	Description
`Ctrl+n`	New note	Create a new Zettelkasten note and focus the editor
`Ctrl+z`	Zoom editor	Toggle editor zoom (tmux-style)

Application

Key	Action	Description
`q`	Quit hint	Display quit instructions (use `/q` or `/exit` to actually quit)

Command Bar Keys

When the command bar is focused (after pressing /):

Key	Action
`Enter`	Execute command
`Escape`	Cancel and return to normal mode
`Up`	Previous command in history
`Down`	Next command in history
`Tab`	Auto-complete command

Design Notes

Key bindings follow Vim conventions for navigation (hjkl) and use mnemonic single keys for mode cycling (v for view, o for overlay, c for candidates). Panel toggles use bracket keys ([, ], \) which are adjacent on a standard keyboard. The / key enters command mode, matching the slash-command convention used by Claude Code and similar tools.

The CLI

Gaius provides a non-interactive command-line interface through gaius-cli. It executes the same slash commands available in the TUI but returns structured output suitable for scripting, piping, and automation.

Basic Usage

uv run gaius-cli --cmd "/command" --format json

The --cmd flag specifies the slash command to run (with or without the leading /). The --format flag controls output format – json produces machine-readable output, while the default produces human-readable text.

Examples

Check system health:

uv run gaius-cli --cmd "/health" --format json

Query GPU and endpoint status:

uv run gaius-cli --cmd "/gpu status" --format json

Check evolution state:

uv run gaius-cli --cmd "/evolve status" --format json

View the current application state:

uv run gaius-cli --cmd "/state" --format json

Available Commands

Gaius has 63 slash commands covering health diagnostics, agent management, inference control, knowledge base operations, evolution, visualization, and observability. The full command reference is in the CLI Commands section.

Common command categories:

Prefix	Domain	Example
`/health`	System health	`/health`, `/health fix engine`
`/gpu`	GPU/endpoints	`/gpu status`, `/gpu cleanup`
`/evolve`	Agent evolution	`/evolve status`, `/evolve trigger`
`/kb`	Knowledge base	`/kb search <query>`
`/render`	Visualization	`/render cards`
`/observe`	Observability	`/observe metrics`

Connection to the Engine

The CLI connects to the same gRPC engine (port 50051) as the TUI. Both interfaces are thin clients that send commands to the engine and display results. If the engine is not running, the CLI will report a connection error – start services with devenv processes up -d or just restart-clean.

Next Steps

Command Patterns – JSON output, jq piping, polling techniques
Scripting – using gaius-cli in shell scripts

Command Patterns

Common patterns for working with gaius-cli effectively. The CLI produces structured JSON output that integrates naturally with standard Unix tools.

JSON Output and jq

Most commands support --format json for machine-readable output. Pipe through jq to extract specific fields:

# Get endpoint names and statuses
uv run gaius-cli --cmd "/gpu status" --format json | jq '.data.endpoints[] | {name, status}'

# Extract just the health categories that are not OK
uv run gaius-cli --cmd "/health" --format json | jq '.data.checks[] | select(.status != "ok")'

# Get the current evolution generation number
uv run gaius-cli --cmd "/evolve status" --format json | jq '.data.generation'

Polling for Status Changes

When waiting for an operation to complete, poll in a loop:

# Watch endpoints transition from STARTING to HEALTHY after a restart
for i in $(seq 1 15); do
    sleep 10
    uv run gaius-cli --cmd "/gpu status" --format json | \
        jq -r '.data.endpoints[] | "\(.name): \(.status)"'
    echo "---"
done

Comparing Before and After

Capture state before and after an operation:

# Snapshot before
uv run gaius-cli --cmd "/health" --format json > /tmp/health-before.json

# Run an operation
uv run gaius-cli --cmd "/health fix engine" --format json

# Snapshot after
uv run gaius-cli --cmd "/health" --format json > /tmp/health-after.json

# Diff
diff <(jq -S . /tmp/health-before.json) <(jq -S . /tmp/health-after.json)

Batch Operations

Run multiple commands in sequence:

# Check everything in one pass
for cmd in "/health" "/gpu status" "/evolve status"; do
    echo "=== $cmd ==="
    uv run gaius-cli --cmd "$cmd" --format json | jq '.data'
    echo
done

Conditional Logic

Use jq exit codes to drive decisions:

# Only proceed if all endpoints are healthy
if uv run gaius-cli --cmd "/gpu status" --format json | \
    jq -e '.data.endpoints | all(.status == "HEALTHY")' > /dev/null 2>&1; then
    echo "All endpoints healthy, proceeding"
    uv run gaius-cli --cmd "/evolve trigger" --format json
else
    echo "Not all endpoints healthy, aborting"
    exit 1
fi

Timestamp and Logging

Add timestamps for log correlation:

uv run gaius-cli --cmd "/health" --format json | \
    jq --arg ts "$(date -Iseconds)" '. + {queried_at: $ts}'

Error Handling

The CLI returns non-zero exit codes on failure. Check both the exit code and the response:

if ! output=$(uv run gaius-cli --cmd "/gpu status" --format json 2>&1); then
    echo "CLI failed: $output"
    exit 1
fi
echo "$output" | jq '.data'

Scripting

The gaius-cli is designed for non-interactive use in shell scripts. It connects to the gRPC engine, executes a command, prints output, and exits. This makes it suitable for cron jobs, monitoring scripts, and automation pipelines.

Health Monitoring Script

A script that checks system health and sends alerts on failures:

#!/usr/bin/env bash
set -euo pipefail

LOG="/var/log/gaius-health.log"

health=$(uv run gaius-cli --cmd "/health" --format json)
failed=$(echo "$health" | jq '[.data.checks[] | select(.status != "ok")] | length')

if [ "$failed" -gt 0 ]; then
    echo "$(date -Iseconds) ALERT: $failed health checks failing" >> "$LOG"
    echo "$health" | jq '.data.checks[] | select(.status != "ok")' >> "$LOG"
fi

Periodic Data Collection

Capture endpoint metrics at regular intervals for trend analysis:

#!/usr/bin/env bash
set -euo pipefail

OUTDIR="$HOME/gaius-metrics/$(date +%Y-%m-%d)"
mkdir -p "$OUTDIR"

TIMESTAMP=$(date +%H%M%S)

uv run gaius-cli --cmd "/gpu status" --format json > "$OUTDIR/${TIMESTAMP}_gpu.json"
uv run gaius-cli --cmd "/health" --format json > "$OUTDIR/${TIMESTAMP}_health.json"
uv run gaius-cli --cmd "/evolve status" --format json > "$OUTDIR/${TIMESTAMP}_evolve.json"

Run via cron every 5 minutes:

*/5 * * * * cd /path/to/gaius && devenv shell -- bash scripts/collect-metrics.sh

Endpoint Readiness Gate

Wait for all endpoints to be healthy before proceeding with a downstream operation:

#!/usr/bin/env bash
set -euo pipefail

MAX_WAIT=300  # 5 minutes
INTERVAL=10
elapsed=0

echo "Waiting for endpoints to become healthy..."
while [ $elapsed -lt $MAX_WAIT ]; do
    if uv run gaius-cli --cmd "/gpu status" --format json | \
        jq -e '.data.endpoints | all(.status == "HEALTHY")' > /dev/null 2>&1; then
        echo "All endpoints healthy after ${elapsed}s"
        exit 0
    fi
    sleep $INTERVAL
    elapsed=$((elapsed + INTERVAL))
done

echo "Timed out waiting for endpoints after ${MAX_WAIT}s"
exit 1

Evolution Report

Generate a summary of the current evolution state:

#!/usr/bin/env bash
set -euo pipefail

echo "=== Gaius Evolution Report $(date -Iseconds) ==="
echo

echo "## Agent Status"
uv run gaius-cli --cmd "/evolve status" --format json | \
    jq -r '.data | "Generation: \(.generation)\nActive agents: \(.active_agents)"'

echo
echo "## Endpoint Status"
uv run gaius-cli --cmd "/gpu status" --format json | \
    jq -r '.data.endpoints[] | "  \(.name): \(.status)"'

echo
echo "## Health Summary"
uv run gaius-cli --cmd "/health" --format json | \
    jq -r '.data.checks[] | "  \(.name): \(.status)"'

Tips for Robust Scripts

Always use set -euo pipefail at the top of scripts
Check that the engine is reachable before running a batch of commands
Use --format json consistently so output is parseable
Capture output to variables when you need to inspect it multiple times
Log timestamps alongside data for correlation with system events

MCP Integration

Gaius exposes 163 tools via the Model Context Protocol (MCP), making its full functionality available to Claude Code and other MCP-compatible AI clients.

What Is MCP?

The Model Context Protocol is a standard for connecting AI assistants to external tools and data sources. When configured, Claude Code can call Gaius tools directly – checking health, querying the knowledge base, managing agents, and running operations – all within a conversational workflow.

Starting the MCP Server

uv run gaius-mcp

This starts a stdio-based MCP server that communicates with Claude Code over standard input/output. The server connects to the same gRPC engine (port 50051) used by the TUI and CLI.

What You Can Do

With MCP integration, Claude Code can operate across the full Gaius stack:

Category	Tools	Examples
Health & Diagnostics	15 tools	`health_observer_check`, `fmea_catalog`, `gpu_health`
Agent Management	12 tools	`optimize_agent`, `save_agent_version`, `run_swarm`
Knowledge Base	10 tools	`search_kb`, `semantic_search`, `create_kb`
Inference & Scheduling	8 tools	`ask_reasoning`, `evaluate_with_xai`, `scheduler_submit`
Observability	8 tools	`prometheus_query`, `observe_metrics`, `metabase_get_dashboard`
Content Pipeline	12 tools	`article_curate`, `collection_publish_cards`, `publish_cards`
Evolution & Training	10 tools	`trigger_evolution`, `run_daily_evaluation`, `get_evolution_trend`
Topology & Geometry	6 tools	`compute_tda`, `explain_grid_position`, `calibrate_understanding`

Architecture

The MCP server is a thin client over the same gRPC engine used by the TUI and CLI. Each MCP tool maps to an engine service call or direct database/HTTP query. The server handles JSON serialization and error propagation.

Claude Code  <--stdio-->  gaius-mcp  <--gRPC-->  Engine (port 50051)
                                     <--HTTP-->  Services (Metabase, Prometheus, etc.)
                                     <--SQL-->   PostgreSQL (port 5444)

This architecture means MCP tools have exactly the same capabilities as CLI commands — no degraded mode, no subset API.

Next Steps

Claude Code Setup – configure your .claude.json to connect
Tool Categories – browse the 163 tools by domain

Claude Code Setup

This page describes how to configure Claude Code to use the Gaius MCP server, giving Claude Code direct access to all 163 Gaius tools.

Configuration

Add the Gaius MCP server to your Claude Code MCP configuration. The configuration file is typically at ~/.claude.json or in your project’s .claude/ directory.

Add the following to the mcpServers section:

{
  "mcpServers": {
    "gaius": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/gaius", "gaius-mcp"],
      "env": {
        "GAIUS_ENGINE_HOST": "localhost",
        "GAIUS_ENGINE_PORT": "50051"
      }
    }
  }
}

Replace /path/to/gaius with the absolute path to your Gaius repository checkout.

Environment Variables

The MCP server respects these environment variables:

Variable	Default	Purpose
`GAIUS_ENGINE_HOST`	`localhost`	gRPC engine hostname
`GAIUS_ENGINE_PORT`	`50051`	gRPC engine port
`DATABASE_URL`	from config	PostgreSQL connection URL

In most setups, the defaults work without any environment overrides.

Prerequisites

Before Claude Code can use Gaius tools, the platform services must be running:

cd /path/to/gaius
devenv shell
devenv processes up -d

The MCP server connects to the gRPC engine on startup. If the engine is not running, tool calls will fail with connection errors.

Verifying the Connection

After configuring, ask Claude Code to run a health check:

“Check the Gaius health status”

Claude Code should invoke the health_observer_check tool and return a structured health report. If it reports connection errors, verify that devenv processes up -d has been run.

Tool Discovery

Claude Code can list available tools. The 163 tools are organized into categories such as health, agents, inference, knowledge base, observability, evolution, visualization, and bases. See Tool Categories for the full breakdown.

Security Considerations

The MCP server runs locally and communicates with Claude Code over stdio. It does not expose a network port. All operations are scoped to the local Gaius instance. For ACP (Agent Client Protocol) integration, which involves GitHub operations, additional security controls apply – see the ACP Security Model documentation.

Troubleshooting

Symptom	Cause	Fix
“Tool not found”	MCP config not loaded	Restart Claude Code after editing config
Connection refused	Engine not running	Run `devenv processes up -d`
Timeout on tool calls	Engine overloaded	Check `/gpu status` or run `just restart-clean`
Python errors	Dependencies missing	Run `uv sync` in the gaius directory

Tool Categories

The 163 MCP tools are organized by domain. Each tool maps to an internal service call and accepts JSON arguments.

Health

Tools for system diagnostics, self-healing, and incident management.

Tool	Purpose
`health_observer_status`	Current observer daemon state
`health_observer_check`	Run health diagnostic across all categories
`health_observer_start` / `stop`	Control the health observer daemon
`health_observer_incidents`	List active and recent incidents
`health_observer_incident_detail`	Detailed view of a specific incident
`fmea_catalog`	Browse failure modes and their RPN scores
`fmea_calculate_rpn`	Calculate Risk Priority Number for a failure mode
`fmea_get_controls`	Get remediation controls for a failure mode

Agents and Evolution

Tools for managing agent versions, evolution cycles, and cognition.

Tool	Purpose
`list_agent_versions`	All agent versions with metadata
`get_active_config`	Current active agent configuration
`get_best_agent_version`	Highest-performing version for an agent
`save_agent_version` / `rollback_agent`	Version management
`optimize_agent`	Trigger optimization for an agent
`evolution_status`	Current generation, evaluation state
`trigger_evolution`	Start a new evolution cycle
`trigger_task_ideation`	Generate new training tasks
`get_capability_gaps`	Identify areas where agents underperform

Inference and Models

Tools for managing LLM endpoints, inference scheduling, and XAI budget.

Tool	Purpose
`list_models` / `get_model`	Browse available models
`gpu_health`	GPU utilization and endpoint status
`model_launch_coding` / `model_stop_coding`	Control inference endpoints
`model_generate_code`	Generate code using a managed model
`model_validate_code`	Validate generated code
`get_xai_budget` / `reset_xai_budget`	Manage XAI inference budget
`evaluate_with_xai`	Run evaluation using XAI model

Knowledge Base

Tools for searching, reading, and managing knowledge base content.

Tool	Purpose
`search_kb`	Full-text search across KB entries
`read_kb` / `create_kb` / `update_kb` / `delete_kb`	CRUD operations
`list_kb`	List entries with filters
`kb_sync`	Synchronize KB with external sources
`semantic_search`	Vector similarity search
`embed_text` / `embed_texts`	Generate embeddings

Observability

Tools for metrics, monitoring, and system telemetry.

Tool	Purpose
`observe_status` / `observe_metrics`	Observability pipeline state
`prometheus_query` / `prometheus_query_range`	Direct PromQL queries
`prometheus_health`	Prometheus server status
`metabase_status`	Metabase analytics server status
`metabase_list_dashboards` / `metabase_get_dashboard`	Browse dashboards
`log_activity` / `get_activity_stats` / `get_daily_summary`	Activity tracking

Visualization

Tools for rendering card visualizations and managing collections.

Tool	Purpose
`collection_status` / `collection_list` / `collection_create`	Manage collections
`collection_add_card` / `collection_list_cards`	Card management
`collection_publish_cards` / `collection_publish_viz`	Publishing pipeline
`collection_generate_summaries`	AI-generated card summaries
`article_list` / `article_curate` / `article_new`	Article management

Bases (Feature Store)

Tools for querying the Bases feature store.

Tool	Purpose
`bases_list`	List available bases
`bases_query`	Run DQL queries against a base
`bases_entity_history`	Entity change history
`bases_health`	Feature store health status

Cognition and Memory

Tools for agent thinking, memory consolidation, and self-reflection.

Tool	Purpose
`trigger_cognition`	Trigger a cognition cycle
`trigger_self_observation`	Agent self-reflection
`get_thought_chain` / `get_recent_thoughts`	View agent reasoning
`what_are_you_thinking`	Current agent state of mind
`theta_sitrep` / `theta_consolidate`	Theta wave memory consolidation
`reflect` / `quick_thought`	Lightweight reflection tools

Workflows

Gaius supports multi-step workflows that combine CLI commands, MCP tools, and TUI interactions. This section documents the most common patterns.

What Is a Workflow?

A workflow is a sequence of operations that achieve a goal larger than any single command. For example, researching a topic involves creating KB entries, curating articles, generating cards, and publishing a collection. Each step uses different Gaius capabilities, and the output of one step feeds the next.

Three Interaction Layers

Workflows can be executed through any combination of the three interfaces:

TUI: interactive exploration, visual pattern recognition, manual curation
CLI: scripted operations, batch processing, automated checks
MCP: AI-assisted orchestration, where Claude Code drives multi-step sequences

The choice depends on the task. Health monitoring is best scripted via CLI. Research curation benefits from MCP-driven AI assistance. Spatial exploration requires the TUI.

Common Workflows

Research Workflow

End-to-end knowledge synthesis: define a topic, curate articles from the web (via Brave search), create cards with enriched metadata and topology features, render LuxCore visualizations, and publish a collection. Each run produces ~20 cards in under 2 minutes. The pipeline flows through NiFi ingestion → Metaflow processing → Nomic embedding → Qdrant indexing → PostgreSQL storage → R2 rendering.

Health Workflow

System diagnosis and remediation: run health checks, interpret Guru Meditation Codes, apply self-healing fixes, and monitor recovery. The Health Observer daemon runs continuously, scoring incidents via FMEA (Severity × Occurrence × Detection). When RPN exceeds threshold, it escalates to Mistral Vibe via the Agent Client Protocol.

Evolution Workflow

Agent improvement cycle: generate training tasks (from ideation, calibration, or held-out queries), trigger evaluation against a ground-truth oracle, compare candidates via the DaemonOracle, and promote successful agents. Evolution runs opportunistically during GPU idle periods (<30% utilization). Methods include APO and GEPA optimization with TIES/DARE parameter-space merging.

Workflow Principles

Self-healing first. When something breaks, try /health fix <service> before manual intervention. The self-healing system learns from each invocation.

Test via CLI. After any code change or operation, verify the result with gaius-cli. Previous outputs are invalidated by changes – always re-run the command.

Fail fast. Gaius surfaces errors immediately with actionable remediation paths. If a step fails, the error message tells you what to do next. There are no silent fallbacks.

Observe, then act. Use the OODA loop: observe system state (/health, /gpu status), orient by comparing overlays, decide on an action, then act. Do not skip the observation step.

Research Workflow

The research workflow takes a topic from initial exploration through to a published collection of enriched cards. This is the primary content pipeline in Gaius.

Overview

Topic definition --> Article curation --> Card creation --> Enrichment --> Publishing

Each step builds on the previous one. The workflow can be driven manually through the CLI, or orchestrated by Claude Code via MCP tools.

Step 1: Define the Topic

Create or select an article definition with keywords and news queries that guide content discovery:

# List existing articles
uv run gaius-cli --cmd "/article list" --format json

# Create a new article with topic keywords
uv run gaius-cli --cmd "/article new" --format json

Articles need keywords and/or news_queries in their frontmatter for the Brave fetcher to find relevant sources. Without these, curation will fail fast with #ACF.00000013.NOHINTS.

Step 2: Curate Articles

Run the article curation flow to fetch and process relevant content:

uv run gaius-cli --cmd "/article curate" --format json

The curation flow:

Searches the web using configured keywords and news queries
Fetches and extracts content from discovered URLs
Evaluates relevance against a selection rubric
Creates cards from qualifying articles (~20 cards per run, ~2 minutes)

The selection rubric includes a curation_readiness gate that prevents selecting articles whose metadata is incomplete.

Step 3: Enrich Cards

Cards are created with basic metadata. Enrichment adds embeddings, summaries, and topology features:

# Check enrichment status
uv run gaius-cli --cmd "/collection list cards" --format json

# Generate summaries for cards that need them
uv run gaius-cli --cmd "/collection generate summaries" --format json

Card publishing is gated on enrichment completeness – cards without sufficient enrichment cannot be published.

Step 4: Render Visualizations

Each card gets a deterministic visualization rendered by the LuxCore engine:

uv run gaius-cli --cmd "/render cards" --format json

The grammar engine generates a unique visual based on the card’s topology features, seeded by hash(card_id) for deterministic output. Two variants are produced: display (1400x300) and og (1200x630 for social sharing).

Step 5: Publish Collection

Publish the completed cards to a collection:

# Create or select a collection
uv run gaius-cli --cmd "/collection create" --format json

# Add cards to the collection
uv run gaius-cli --cmd "/collection add card" --format json

# Publish
uv run gaius-cli --cmd "/collection publish cards" --format json

MCP-Driven Research

When using Claude Code with MCP tools, the entire workflow can be conversational:

“Research the topic of topological data analysis in financial risk. Curate articles, enrich the cards, and publish a collection.”

Claude Code will call article_new, article_curate, collection_generate_summaries, and collection_publish_cards in sequence, reporting progress at each step.

Monitoring Collection Balance

The pending_cards metric is the most effective signal for collection diversity. Monitor it to ensure the collection is not over-weighted toward a single source or topic.

Health Workflow

The health workflow covers diagnosing system issues, applying self-healing fixes, and monitoring recovery. Gaius implements a fail-fast policy with actionable error messages, so every failure tells you what to do next.

Step 1: Diagnose

Run the health check to see the current state of all services:

uv run gaius-cli --cmd "/health" --format json

This returns a structured report with checks organized by category. Each check has a status (ok, warn, fail) and a message explaining the current state.

To check a specific category:

uv run gaius-cli --cmd "/health engine" --format json
uv run gaius-cli --cmd "/health endpoints" --format json

Step 2: Interpret Failures

Failed checks include Guru Meditation Codes – unique identifiers for each failure mode. For example:

#DS.00000001.SVCNOTINIT – DatasetService not initialized
#NF.00000001.UNREACHABLE – NiFi not reachable
#EP.00000001.GPUOOM – GPU out of memory

Each code maps to a documented heuristic with symptom, cause, observation method, and solution. The error message itself contains remediation hints.

Step 3: Self-Heal

Always try /health fix before manual intervention. This is a design principle, not a suggestion:

uv run gaius-cli --cmd "/health fix engine" --format json
uv run gaius-cli --cmd "/health fix endpoints" --format json
uv run gaius-cli --cmd "/health fix nifi" --format json

Available fix targets: engine, dataset, nifi, postgres, qdrant, minio, endpoints, evolution.

Each fix strategy is a multi-step remediation sequence with verification at each step. The system attempts increasingly aggressive fixes until the service recovers.

Step 4: Monitor Recovery

After applying a fix, monitor the health observer for recovery:

# Check observer status
uv run gaius-cli --cmd "/health observer status" --format json

# List active incidents
uv run gaius-cli --cmd "/health observer incidents" --format json

# Poll for recovery
for i in $(seq 1 10); do
    sleep 15
    uv run gaius-cli --cmd "/health" --format json | \
        jq '.data.checks[] | select(.status != "ok") | {name, status, message}'
done

Step 5: Escalation

If /health fix does not resolve the issue, the Health Observer can escalate via ACP (Agent Client Protocol) to Mistral Vibe for deeper analysis. This happens automatically when:

An incident exceeds the configured FMEA RPN threshold
Local remediation has failed
The incident is not in cooldown

Manual escalation path – use just restart-clean as the last resort:

just restart-clean

This performs a full clean restart of all services: stops everything, cleans up state, and restarts from scratch.

FMEA Framework

The health system uses Failure Mode and Effects Analysis (FMEA) to prioritize issues. Each failure mode has a Risk Priority Number (RPN) computed from severity, occurrence frequency, and detection difficulty. Higher RPNs get attention first.

# View the FMEA catalog
uv run gaius-cli --cmd "/fmea catalog" --format json

# Calculate RPN for a specific failure mode
uv run gaius-cli --cmd "/fmea rpn <mode>" --format json

Health Observer Daemon

The Health Observer runs as a background daemon, continuously monitoring service health and automatically triggering remediation when issues are detected:

# Start the observer
uv run gaius-cli --cmd "/health observer start" --format json

# Stop the observer
uv run gaius-cli --cmd "/health observer stop" --format json

When running, it checks services periodically and logs incidents. Resolved incidents are filtered out of the active list, but unknown or unexpected states remain visible (fail-open for observability).

Evolution Workflow

The evolution workflow improves Gaius agents over time through task ideation, training, evaluation, and promotion. This is a cycle that repeats as agents accumulate more data and experience.

Overview

Status check --> Task ideation --> Training --> Evaluation --> Promotion
     ^                                                           |
     |___________________________________________________________|

Each cycle produces a new generation of agent versions. Successful versions are promoted to active status; underperformers are retained for comparison but not used in production.

Step 1: Check Status

Before starting an evolution cycle, check the current state:

uv run gaius-cli --cmd "/evolve status" --format json

This shows the current generation number, active agents, evaluation state, and any capability gaps. Pay attention to:

Generation: which cycle you are on
Active agents: which agent versions are currently serving
Capability gaps: areas where agents underperform

Step 2: Task Ideation

Generate new training tasks based on identified capability gaps:

uv run gaius-cli --cmd "/evolve task ideation" --format json

The ideation process analyzes recent performance data and gap analysis to propose tasks that target specific weaknesses. Tasks are designed to push agents toward areas where they currently underperform.

Step 3: Trigger Evolution

Start the evolution cycle. This runs training with the generated tasks and produces new agent versions:

uv run gaius-cli --cmd "/evolve trigger" --format json

Evolution requires healthy inference endpoints. Verify with:

uv run gaius-cli --cmd "/gpu status" --format json | \
    jq '.data.endpoints[] | {name, status}'

All endpoints should show HEALTHY before triggering evolution. If they do not, run /health fix endpoints first.

Step 4: Evaluate

After training completes, evaluate the new agent versions against held-out test data:

# Check evaluation results
uv run gaius-cli --cmd "/evolve status" --format json | jq '.data.evaluation'

# View held-out statistics
uv run gaius-cli --cmd "/evolve held-out stats" --format json

Evaluation uses the RASE verification framework. Each agent version is scored on accuracy (0.0-1.0, proportion of constraints satisfied) and compared against previous versions.

Step 5: Promote or Roll Back

If the new version outperforms the current active version, promote it:

# View the best version
uv run gaius-cli --cmd "/evolve best" --format json

# Promote (via MCP or direct command)
uv run gaius-cli --cmd "/evolve promote" --format json

If the new version underperforms, roll back to a known good version:

uv run gaius-cli --cmd "/evolve rollback" --format json

Evolution Daemon

For continuous improvement, start the evolution daemon which runs cycles automatically:

# Start the daemon
uv run gaius-cli --cmd "/evolve daemon start" --format json

# Check daemon status
uv run gaius-cli --cmd "/evolve daemon status" --format json

# Stop the daemon
uv run gaius-cli --cmd "/evolve daemon stop" --format json

The daemon monitors capability gaps and triggers evolution cycles when thresholds are exceeded.

Monitoring Evolution Trends

Track improvement over time:

uv run gaius-cli --cmd "/evolve trend" --format json

This shows how agent performance has changed across generations. Look for:

Upward trend: agents are improving, the evolution cycle is working
Plateau: training tasks may need diversification, or capability limits have been reached
Regression: roll back to a previous version and investigate

Model Merging

When multiple specialized agent versions exist, model merging can combine their strengths:

# View merge candidates
uv run gaius-cli --cmd "/evolve merge candidates" --format json

# Trigger a merge
uv run gaius-cli --cmd "/evolve merge" --format json

# View lineage
uv run gaius-cli --cmd "/evolve lineage" --format json

Model lineage tracking records the ancestry of each merged version, enabling traceability from the final model back to its training data and parent versions.

Design Philosophy

Gaius is more than a visualization tool—it’s an experiment in augmented cognition. The design integrates principles from human factors engineering, situational awareness research, and decades of interface evolution to create something genuinely new.

Foundational Principles

1. Spatial Cognition First

Humans evolved to navigate physical space. We have dedicated neural hardware for:

Allocentric mapping: Understanding space from a fixed reference frame
Path integration: Tracking position through movement
Landmark recognition: Identifying significant points

Gaius exploits this by mapping abstract data onto a navigable grid. The cursor becomes your position. Regions become territories. Movement through the grid engages spatial reasoning circuits that spreadsheets leave dormant.

2. Perceptual Bandwidth

Vision is our highest-bandwidth sense. Reading text: ~250 words/minute. Recognizing a scene: ~100ms. Gaius prioritizes visual pattern recognition over sequential text processing.

When you see agents clustered in a corner with death loops nearby, you perceive the situation instantly—before you could read a report describing it.

Modal interfaces concentrate related operations. In normal mode, every key is a navigation or view command—no modifier keys needed. This reduces both physical motion and cognitive load.

Critics of modes cite “mode errors” (typing in wrong mode). Gaius addresses this with:

Clear mode indicators in status line
Consistent escape semantics (Esc always returns to normal)
Mode-appropriate cursor styling (planned)

4. Progressive Complexity

New users see a clean grid. They navigate with hjkl, toggle modes, quit with q. Nothing confusing.

Power users access deeper functionality through slash commands, MCP tools, and CLI scripting. Three interfaces — TUI, CLI, MCP — offer increasing levels of automation.

Complexity is opt-in, not mandatory.

5. Transparency Over Magic

Every visual element has an explanation. The grid shows exactly what it’s told to show. Agent positions derive from actual embeddings through a defined projection. Death loops come from computed homology.

No black boxes. No “AI magic.” Understanding the system enables trusting the system.

Human Factors Integration

Gaius incorporates principles from human factors engineering—the discipline of designing systems that account for human capabilities and limitations.

Cognitive Load Management

Miller’s Law: Working memory holds 7±2 chunks. Gaius manages this by:

Showing at most 7 agents (one per color)
Limiting candidate markers to 9 (a-i)
Using overlays to separate concerns (one layer at a time)

Hick’s Law: Decision time increases with choice count. Modal operation reduces active choices at any moment.

Attention and Distraction

The grid provides a stable anchor. Overlays add information; the base never shifts unexpectedly.

Status updates appear in the designated status line—not as popups or animations that hijack attention.

Error Prevention

Confirmation for destructive actions: Clear memory, quit with unsaved changes Reversible operations: Overlay cycles, mode toggles, cursor movement Visible state: Current mode, active features, domain always displayed

Fitts’s Law and Input

Fitts’s Law: Target acquisition time depends on distance and size. Keyboard input eliminates targeting entirely—no mouse movement, no precision required.

hjkl navigation is the fastest possible input for grid movement.

Situational Awareness

Situational awareness (SA) is the perception, comprehension, and projection of system states. Gaius is explicitly designed to support all three levels of SA as defined by Endsley (1995).

Level 1: Perception

What is happening?

Gaius provides immediate perception through:

Grid state: See where entities are located
Density shading: See relative magnitudes at a glance
Agent positions: See where each analytical lens is focused
Death loops: See topological features visually

No reading required. No scrolling. The state is visible.

Level 2: Comprehension

What does it mean?

Comprehension emerges from:

Spatial relationships: Clusters = consensus, scatter = uncertainty
Overlay transitions: Compare views to understand multi-dimensional state
Color coding: Consistent agent colors build recognition
Historical context: Memory enables “this is different from before”

Level 3: Projection

What will happen next?

Projection is supported by:

Swarm dynamics: Watch convergence/divergence trends
Entropy tracking: Rising entropy may signal regime change
Death loop evolution: New loops appearing = emerging risk
Agent trajectories: Where is each analytical perspective moving?

SA Demons (Threats to Awareness)

Endsley identified common SA failures. Gaius defends against them:

SA Demon	Gaius Defense
Attention tunneling	Overlay cycling forces perspective shifts
Data overload	Layered disclosure; modes separate concerns
Out-of-the-loop	Swarm runs show agent “thinking” in real-time
Misplaced salience	Consistent visual vocabulary; no flashy distractions
Complexity creep	Feature flags; base UI is minimal

The OODA Loop

Boyd’s OODA (Observe-Orient-Decide-Act) loop describes competitive decision-making:

Observe: Grid displays current state
Orient: Overlays, memory search, agent positions inform context
Decide: Slash commands, domain changes, focus actions
Act: Run swarm rounds, mark positions, export insights

Fast OODA loops win. Gaius minimizes latency at every stage.

Design Tensions

Every design involves tradeoffs. Gaius makes explicit choices:

Density vs. Clarity

The grid could show more information (color + shape + size). We prioritize clarity—one symbol per cell, overlays for additional dimensions.

Flexibility vs. Consistency

Custom projections enable domain adaptation. But core navigation (hjkl) never changes. Flexibility in content, consistency in interaction.

Power vs. Accessibility

Modal interfaces have a learning curve. We accept this tradeoff because mastery enables flow states inaccessible to modeless interfaces.

Automation vs. Control

Agents suggest; humans decide. The swarm provides perspectives, not prescriptions. Autonomy remains with the operator.

The Goal: Augmented Cognition

Gaius aims to extend human perception into domains we can’t naturally sense:

High-dimensional embedding spaces
Topological structure of point clouds
Collective reasoning of agent swarms

By projecting these onto a navigable grid with overlays and keyboard-driven interaction, we make the invisible visible—and navigable.

This is augmentation, not replacement. The human remains in control, with enhanced perception of complex systems.

Co-Creation with Code Agents

Gaius represents a novel architectural pattern: an application co-created with AI code agents, where the development process itself shapes the system’s design.

The Co-Creation Paradigm

Traditional software development follows a clear separation: humans design, humans implement, humans document. Gaius challenges this by integrating Claude Code (powered by Claude Opus 4.5) as a first-class development partner.

This isn’t “AI-assisted coding” in the conventional sense. It’s a symbiotic development process where:

The human provides vision and judgment — strategic direction, quality assessment, architectural taste
The code agent provides implementation velocity — exploring codebases, generating code, maintaining consistency
The system evolves through dialogue — features emerge from conversation, not specification documents

Implications for Architecture

When an AI agent is a development partner, certain architectural choices become natural:

Interface Parity: CLI, TUI, and MCP interfaces must provide equivalent functionality. Why? Because the code agent (via MCP) needs access to the same operations the human uses (via TUI). Parity isn’t a nice-to-have; it’s essential for the agent to effectively participate in development and testing.

Living Documentation in the KB: Command references live in the Knowledge Base ([[current/commands/]]), not frozen in mdbook. The command set evolves as the agent and human add features together. Static documentation would be perpetually stale.

Self-Describing Systems: The MCP tools are the API. The CLI commands are the operations. When these are well-named and well-documented, the code agent can discover and use them without additional instruction.

The Knowledge Base as Shared Memory

A key insight: the KB serves as shared context between human and agent across sessions.

What Belongs in the KB vs. mdbook

KB (`build/dev/`)	mdbook (`docs/`)
Command reference (evolving)	Design philosophy (stable)
Current research threads	Architectural foundations
Session notes and decisions	Core concepts
Feature-specific documentation	User guides
Agent-generated analysis	Contributing guidelines

The distinction: KB content may change between sessions as features evolve. mdbook content captures enduring principles that guide the evolution.

Example: The Commands Directory

The command reference in [[current/commands/]] was created during a session where we:

Audited all commands across CLI, TUI, and MCP
Identified parity gaps
Documented each interface comprehensively

This documentation now serves multiple purposes:

For humans: Quick reference, training material
For code agents: Discovery of available operations
For development: Gap analysis, parity tracking

If we added the command reference to mdbook, it would be outdated within days. In the KB, it can evolve with the system.

BDD as Collaborative Specification

Behavior-Driven Development (BDD) takes on new significance in co-created systems.

Feature Files as Contracts

Gherkin feature files (features/*.feature) serve as:

Executable specifications — Tests that verify behavior
Agent-readable requirements — Clear, structured descriptions the code agent can understand
Living documentation — Always synchronized with actual behavior

Feature: Wiki Link Resolution
  As a knowledge worker
  I want broken wiki links to resolve via search
  So that the knowledge graph grows organically

  Scenario: Selecting an unresolved wiki link
    Given a file "test.md" containing "[[nonexistent-topic]]"
    When I select the broken link in the graph panel
    Then a search runs for "nonexistent-topic"
    And a new zettelkasten note is created
    And the original link is updated to point to the new note

This scenario was implemented in a single session. The code agent:

Read the feature file to understand requirements
Implemented the feature across multiple files
Created tests to verify the behavior

Scenarios as Design Discussions

BDD scenarios often emerge from human-agent dialogue:

Human: “When I click a broken link, instead of an error, can it search and create a note?”

Agent: “So the flow would be: detect missing target → run search → synthesize note → update original link?”

Human: “Yes, and add a backlink from the new note to the origin.”

This conversation becomes a scenario. The scenario becomes a test. The test drives the implementation.

Interface Parity as Architectural Principle

The three interfaces serve different users:

Interface	Primary User	Interaction Pattern
TUI	Human (interactive)	Real-time visualization, keyboard navigation
CLI	Human (scripting), CI/CD	JSON output, automation
MCP	Code agents, integrations	Structured tool calls

Why Parity Matters

When interfaces drift apart:

The code agent can’t test what the human experiences
Automation scripts break when TUI adds features
Documentation fragments across interfaces

Gaius addresses this through:

Shared core functions — CLI and TUI call the same underlying methods
MCP as the comprehensive API — 163 tools covering all operations
Regular parity audits — Tracking gaps in [[current/commands/index]]

The Parity Matrix

The command coverage matrix explicitly tracks which operations are available where:

| Command      | CLI | TUI | MCP |
|--------------|-----|-----|-----|
| /search      |  ✓  |  ✓  |  ✓  |  ← Full parity
| /model add   |  ✓  |  -  |  ✓  |  ← TUI gap (priority)
| /init        |  -  |  ✓  |  -  |  ← TUI-specific (OK)

This matrix is itself a development artifact that guides prioritization.

Practical Patterns

Pattern 1: Agent-Discoverable Operations

Name commands and tools descriptively:

scheduler_health_check not shc
/evolve trigger not /evo t

The code agent reads these names. Clear naming reduces confusion.

Pattern 2: JSON-First CLI

CLI commands return structured JSON by default:

uv run gaius-cli --cmd "/state" --format json

This enables:

Agent parsing of command output
Scripted verification of behavior
Pipeline integration

Pattern 3: Incremental Documentation

Don’t write comprehensive documentation upfront. Let it emerge:

Implement feature with agent
Agent documents as it implements
Human reviews and refines
Documentation evolves with feature

Pattern 4: Session Handoff

The KB preserves context across sessions:

[[scratch/YYYY-MM-DD/]] — Daily working notes
[[current/commands/]] — Living reference
Research threads — Ongoing investigations

When a new session starts, the agent can read recent KB entries to resume context.

The Meta-Principle

Systems designed for co-creation with code agents are inherently more maintainable.

Why? Because the requirements for agent collaboration—clear interfaces, structured data, living documentation, testable behavior—are the same requirements for long-term maintainability.

Designing for an AI collaborator forces us to:

Make implicit knowledge explicit
Structure operations consistently
Document as we build
Test what we document

These are good practices regardless of whether an agent is involved. The agent just makes them essential.

Future Directions

Agent-Initiated Evolution

Currently, the human initiates feature development. Future systems might:

Have the agent propose features based on usage patterns
Automatically generate BDD scenarios from user feedback
Self-document new capabilities as they’re added

Multi-Agent Development

Gaius already uses multi-agent swarms for analysis. The same pattern could apply to development:

Architect agent proposes structure
Implementation agent writes code
Critic agent reviews
Documentation agent updates KB

Adaptive Interfaces

If the agent tracks which operations are used most, it could:

Suggest adding frequently-used MCP tools to TUI
Identify commands that should be automated
Propose interface simplifications

Conclusion

Gaius isn’t just a tool for augmented cognition—it’s a case study in augmented development. The co-creation paradigm, where human vision and AI implementation velocity combine, produces systems that are:

More consistent — The agent enforces patterns across the codebase
Better documented — Documentation emerges from the development dialogue
More testable — BDD scenarios are natural outputs of requirement discussions
Easier to maintain — Clear interfaces required for agent collaboration benefit all maintainers

The KB as shared memory, interface parity as principle, and BDD as collaborative specification—these patterns aren’t specific to Gaius. They’re applicable to any system designed for human-AI co-creation.

The future of software development isn’t human OR machine. It’s human AND machine, each contributing their strengths to create systems neither could build alone.

Inspirations

Gaius stands on the shoulders of giants. This section traces the lineage of ideas that inform its design.

The Polymath Tradition

Gaius Plinius Secundus (23-79 CE)

Pliny the Elder’s Naturalis Historia attempted to catalog all knowledge of the natural world across 37 books. He wrote: “Nature is to be found in her entirety nowhere more than in her smallest creatures.”

This spirit—systematic observation, comprehensive scope, attention to detail—animates Gaius. The grid is our attempt at a unified view of complex domains.

The Encyclopedists

Diderot and d’Alembert’s Encyclopédie (1751-1772) organized knowledge with cross-references, creating a navigable web of ideas. Gaius’s scene graph and semantic search continue this tradition.

Modern Polymaths

Herbert Simon (AI, economics, psychology), Douglas Engelbart (augmented intelligence), Seymour Papert (constructionism)—thinkers who crossed disciplines to synthesize new understanding. Gaius is built for their intellectual descendants.

Interface Lineages

Terminal Interfaces

From TTY to VT100 to ANSI terminals to modern terminal emulators, the text interface has evolved continuously. Gaius inherits:

Character grid: Discrete, addressable positions
ANSI styling: Colors, bold, background
Keyboard primacy: No mouse required
Stream output: Log panels for sequential information

vi (1976) → vim (1991) → neovim (2014) → modern modal interfaces. Key insights:

Modes reduce modifier keys: Insert mode types; normal mode commands
Composability: d3w (delete 3 words) combines operation + count + motion
Muscle memory: Consistent bindings become automatic

Gaius adopts hjkl and plans command composition (/focus Risk | /analyze).

Plan 9 and Acme

Rob Pike’s Acme editor (1994) introduced:

Mouse chording: Combined mouse buttons for operations
Text as command: Select text, execute it
Windowing without decoration: Content maximizes screen real estate
Unix philosophy at the UI level: Small, composable pieces

Gaius plans Acme-inspired text execution for the log panel.

Professional Interfaces

Bloomberg Terminal

Since 1981, Bloomberg has defined professional data interfaces:

Information density: Every pixel works
Keyboard-first: <GO> commands, function keys, minimal mouse
Consistent vocabulary: Familiar patterns across thousands of functions
Real-time updates: Live data as the base state

Gaius inherits the density and keyboard ethos while modernizing the visual language.

Trading Floors

Before terminals, open outcry trading used:

Spatial organization: Pits and rings for specific instruments
Hand signals: High-bandwidth visual communication
Peripheral awareness: Seeing the whole floor at once

The grid echoes the trading pit—a spatial organization of a complex domain.

Modern Developments

Gödel Terminal

The emerging Gödel Terminal project explores:

AI-native interfaces: Designed for LLM integration
Semantic commands: Natural language as primary input
Dynamic context: Interface adapts to conversation

Gaius draws on this for its slash command system and domain adaptation.

Claude Code

Anthropic’s Claude Code (the tool you’re reading about this in) pioneered:

Slash commands: /help, /clear, /review
Context awareness: Understanding codebase structure
Conversational flow: Natural language with structured commands

Gaius’s command system directly inherits this pattern.

LLM-Augmented Interfaces

The 2023-2024 wave of LLM tools demonstrated:

Natural language as interface: Beyond command-line syntax
Agent architectures: Multiple specialized perspectives
Embeddings everywhere: Semantic similarity as fundamental operation

Gaius integrates all three.

Visualization Traditions

Information Visualization

Tufte’s principles:

Data-ink ratio: Maximize information, minimize decoration
Small multiples: Repeated grids for comparison
Layering and separation: Overlays instead of clutter

Topological Visualization

Carlsson and others showed that shape matters. TDA visualization typically uses:

Persistence diagrams: Birth-death scatter plots
Barcodes: Horizontal bars for feature lifespans

Gaius experiments with projecting these onto the grid—making topology spatial.

Game Interfaces

Go software (KGS, OGS, Sabaki) provides:

Board representation: The 19×19 standard
Coordinate systems: A-T, 1-19
Stone visualization: Contrast, shadows, territory

We inherit the board but repurpose it for data.

Cognitive Science

Embodied Cognition

Lakoff, Johnson, and others argue that thought is grounded in bodily experience. Spatial metaphors (“high status,” “falling behind”) pervade language.

Gaius literalizes these metaphors: positions have meaning, movement has direction, territory can be claimed.

Distributed Cognition

Hutchins showed that cognition extends beyond the skull—tools, environments, and other people participate in thinking.

Gaius + human + agent swarm form a cognitive system. The grid is external memory; agents are external perspectives; topology is external pattern detection.

Ecological Psychology

Gibson’s affordances: the environment offers action possibilities. A grid affords navigation. Overlays afford comparison. Commands afford precision.

Design is the creation of useful affordances.

Synthesis

Gaius attempts to synthesize:

Tradition	Contribution
Polymath encyclopedism	Comprehensive scope, cross-reference
Terminal interfaces	Text grid, keyboard, streaming
Modal editors	hjkl, modes, composition
Plan 9 / Acme	Text as command, minimal chrome
Bloomberg	Density, professionalism, real-time
Gödel / Claude Code	AI-native, slash commands
Visualization	Tufte principles, TDA projection
Cognitive science	Spatial cognition, distributed thinking

The result is something new—an interface paradigm for augmented cognition in complex domains.

Bloomberg Terminal

The Bloomberg Terminal, launched in 1981, remains the gold standard for professional financial interfaces. With over 300,000 subscribers paying ~$24,000/year, it demonstrates that density and keyboard-first design can command premium value.

What Bloomberg Gets Right

Information Density

A Bloomberg screen contains more data per pixel than almost any other interface. Multiple panels display:

Real-time quotes
News headlines
Chart overlays
Analytics
Communication

Nothing is wasted. Every region serves a purpose.

Keyboard Supremacy

Bloomberg operators type commands like AAPL <EQUITY> GO to navigate. Function keys, abbreviations, and muscle memory enable speeds impossible with mouse navigation.

The terminal was designed for traders who can’t afford to look away from the market to find a menu item.

Consistent Mental Model

Despite thousands of functions, Bloomberg maintains consistency:

<GO> executes
<MENU> shows options
Yellow keys are market sectors
Green keys are actions

Learn the pattern once, apply it everywhere.

Real-Time as Default

Bloomberg screens update continuously. You don’t refresh; you watch. The terminal shows the world as it happens.

What Gaius Inherits

Density Without Clutter

The 19×19 grid provides 361 data points. Overlays add dimensions. But each view is coherent—one mode, one overlay, one interpretation.

Bloomberg achieves density through multiple panels. Gaius achieves it through layers on a unified surface.

Keyboard-First

hjkl navigation. Slash commands. No required mouse. Power users should never reach for the trackpad.

Bloomberg charges premium prices for keyboard efficiency. Gaius provides it freely.

Consistency

Overlay cycling always uses o. Mode toggle always uses v. Quit is always q. The vocabulary is small and stable.

Live Updates

Swarm rounds update the grid in real-time. Agent positions shift as analysis proceeds. The view is alive.

Where Gaius Differs

Visual Language

Bloomberg uses dense text, tables, and traditional charts. Gaius uses a spatial grid with symbolic markers.

The grid enables pattern recognition that tables don’t. A cluster is visible instantly; a column of numbers requires scanning.

AI Integration

Bloomberg has added AI features incrementally. Gaius is AI-native—agents are foundational, not bolted on.

Openness

Bloomberg is proprietary and expensive. Gaius is open and free. The design philosophy is available for inspection and critique.

Domain Agnosticism

Bloomberg serves finance. Gaius adapts to any domain via the --domain flag. Pension analysis today, supply chain tomorrow, cybersecurity next week.

Lessons for Gaius

Respect expertise: Bloomberg doesn’t dumb down for casual users. Gaius shouldn’t either.
Invest in consistency: Bloomberg’s decades-old commands still work. Gaius should avoid gratuitous changes.
Optimize for flow: Bloomberg operators enter flow states. Gaius should enable the same.
Density is a feature: Information-rich displays serve experts. Don’t dilute for aesthetics.
Keyboard speed matters: Milliseconds add up over thousands of operations.

The Bloomberg Bar (Status Line)

Bloomberg’s status area shows:

Current function
User identity
Connection status
Contextual hints

Gaius’s status line serves the same purpose:

Ready | TDA on | Swarm (pension) | hjkl=move o=overlay

Both provide constant orientation without demanding attention.

Beyond Bloomberg

Bloomberg optimized for 1980s constraints: text terminals, limited bandwidth, human-only analysis.

Gaius operates in a different era:

Unicode enables rich symbolism beyond ASCII
Embeddings enable semantic operations
Agents provide parallel analysis
Topology reveals hidden structure

We inherit Bloomberg’s keyboard efficiency while transcending its visual limitations.

Gödel Terminal

The Gödel Terminal represents an emerging paradigm for AI-native interfaces. While still evolving, it offers design principles that Gaius incorporates.

The AI-Native Interface

Traditional interfaces were designed for direct manipulation: click buttons, fill forms, navigate menus. The user explicitly specifies every action.

AI-native interfaces shift this paradigm:

Intent over action: Express what you want, not how to do it
Semantic understanding: The interface comprehends context
Adaptive response: Behavior adjusts to situation
Conversational flow: Dialogue as primary interaction

Gödel’s Key Ideas

Semantic Commands

Instead of hierarchical menus, semantic commands express intent:

/analyze the risk concentration in the northeast quadrant

The system interprets “northeast quadrant,” understands “risk concentration,” and executes appropriately.

Context Windows

Gödel maintains rich context:

Current state (what’s displayed)
History (what was discussed)
User patterns (typical workflows)
Domain knowledge (relevant concepts)

Commands are interpreted within this context, reducing verbosity.

Dynamic Layouts

The interface reorganizes based on task:

Analysis mode: Maximize grid, minimize chrome
Research mode: Split with documentation
Comparison mode: Side-by-side views

Agent Integration

Agents aren’t tools invoked occasionally—they’re persistent presences:

Always available for queries
Proactively surface insights
Learn from interaction patterns

What Gaius Inherits

Slash Commands

Gaius’s /command syntax follows Gödel’s semantic approach:

/domain "supply chain"
/ask "What are the top risks?"
/focus Risk

These read as intent expressions, not procedure calls.

Domain Adaptation

The --domain flag and /domain command enable semantic rewiring:

/domain "cybersecurity incident response"

All agents, embeddings, and analyses reorient to the new domain.

Contextual Awareness

Future Gaius versions will maintain:

Session history across restarts
User preference learning
Domain-specific vocabularies
Personalized agent tuning

Proactive Insight (Planned)

Agents could surface observations unprompted:

[Risk] Entropy spike detected. New death loop forming near D4.

The interface becomes an active collaborator, not a passive tool.

Where Gaius Extends Gödel

Spatial Grounding

Gödel uses conventional screen layouts. Gaius adds a spatial metaphor:

Positions have meaning
Navigation has direction
Territory can be claimed

This grounds abstract AI operations in spatial intuition.

Topological Awareness

Gödel focuses on semantic understanding. Gaius adds structural understanding via TDA:

Shape of data
Persistent features
Emergence and dissolution

Visualization Priority

Gödel emphasizes text and conversation. Gaius emphasizes visual pattern:

Grid as primary display
Text as secondary (log panel)
Overlays as visual analysis

Keyboard Efficiency

Gödel often implies mouse/touch interaction. Gaius prioritizes keyboard:

hjkl navigation
Single-key mode toggles
Command completion

Design Tensions

Automation vs. Control

Gödel tends toward autonomous agents. Gaius keeps humans in the loop:

Agents suggest, don’t act
Swarm rounds are explicit (s)
Domain changes are deliberate

Fluidity vs. Stability

Gödel’s dynamic layouts can disorient. Gaius’s grid is stable:

19×19 never changes
Overlays add, don’t rearrange
Status line always present

Natural Language vs. Structure

Gödel embraces free-form input. Gaius balances:

Slash commands for precision
Query commands for natural language
Keyboard bindings for speed

The Synthesis

Gaius combines:

Gödel’s semantic awareness
Gaius’s spatial grounding
Bloomberg’s keyboard efficiency
TDA’s structural insight

The result is an AI-native interface that remains tangible—where complex analysis projects onto a navigable grid.

Future Convergence

As AI-native interfaces mature, we expect:

More spatial metaphors (not just Gaius)
Better keyboard integration
Richer visualization
Deeper agent collaboration

Gaius is an early experiment in this convergence.

Plan 9 & Acme

Plan 9 from Bell Labs (1992) was Ken Thompson and Rob Pike’s attempt to push Unix ideas to their logical conclusion. Its text editor, Acme (1994), remains one of the most influential programmer tools ever created.

Plan 9 Philosophy

Everything is a File

Unix had “everything is a file” as aspiration. Plan 9 achieved it:

Network connections: files
Processes: files
Graphics: files
Input devices: files

This uniformity enables composition. Any tool that reads files can process any system resource.

Distributed by Design

Plan 9 assumed network operation. Local and remote resources accessed identically. Your terminal could seamlessly use CPU from across the network.

Simplicity Through Completion

Rather than adding features, Plan 9 removed special cases. The result is smaller but more general.

Acme: A Different Editor

Acme is startling to modern users:

No syntax highlighting
No configuration files
No plugins
No key bindings (almost)

And yet, Acme users are among the most productive programmers.

Mouse Chording

Acme uses three-button mouse chording:

Left: Select text
Middle: Execute selected text as command
Right: Search/open selected text

Any text can become a command. Type make, select it, middle-click. The boundary between text and action dissolves.

Tags as Command Lines

Each window has a “tag” line containing text. That text is executable:

/home/user/project Del Snarf Get | fmt | Look

Click on Del to delete the window. Click on fmt to reformat. The tag is a command palette you can edit.

No Modes

Acme has no insert/command mode distinction. You’re always in “insert mode”—typing inserts text. Commands are executed by clicking on them.

This eliminates mode errors entirely.

Plumbing

Plan 9’s plumber routes messages based on content. Click on a filename: it opens. Click on an error with line number: editor jumps there. Click on a URL: browser opens.

Pattern matching replaces explicit handlers.

What Gaius Inherits

Text as Command

Gaius plans to make log panel text executable:

[Risk] Cluster forming at K10-L12. Consider /analyze K10.

Click on /analyze K10 to execute it. Agent suggestions become actionable.

Minimal Configuration

Gaius aims for sensible defaults. The grid is 19×19. Colors are fixed. Navigation is hjkl. Power comes from composition, not configuration.

Compositional Commands

Planned command piping:

/region D4-F6 | /analyze | /summarize

Small operations combine into complex workflows—the Unix way.

Simplicity Through Generality

One grid serves many purposes:

Go stones
Pension allocations
Agent positions
Topological features

The grid is general; overlays specialize.

Where Gaius Differs

Modes Exist

Acme’s modelessness works for text editing. Gaius’s modes serve navigation:

Normal mode: hjkl moves cursor
Command mode: typing enters commands
(Future) Visual mode: region selection

Modes concentrate related operations without modifier keys.

Keyboard Priority

Acme was designed for mice (three-button, specifically). Gaius prioritizes keyboard:

Navigation without mouse
Commands via slash prefix
Mode switching via single keys

Both approaches are valid; Gaius serves users who prefer keyboard.

Visualization Over Text

Acme is fundamentally a text environment. Gaius is fundamentally visual:

Grid as primary display
Symbols over words
Patterns over paragraphs

Lessons from Plan 9/Acme

1. Composition Over Features

Don’t add a feature when you can compose existing ones. Gaius’s overlay system composes simple layers; it doesn’t have a “complex visualization mode.”

2. Uniformity Enables Power

Consistent interaction patterns (every overlay cycles with o, every mode toggles with its key) compound into expertise.

3. Text as Interface

Making text executable bridges display and action. Log panel entries become command suggestions.

4. Defaults Over Configuration

Every configuration option is a decision users must make. Prefer good defaults. Gaius’s fixed color scheme and grid size are deliberate.

5. Network Transparency

Gaius doesn’t yet have distributed operation, but the architecture anticipates it:

Agent swarms could run remotely
Vector memory could be shared
Grid state could synchronize

The Acme User Profile

Acme attracts a specific user: one who prefers mastery over convenience, composition over features, simplicity over apparent ease.

Gaius seeks the same users:

Experts who will invest in learning
Polymaths who work across domains
Professionals who value efficiency

If you want a tool that works immediately without learning, Gaius (like Acme) isn’t it. If you want a tool that rewards mastery, welcome.

Rob Pike’s Influence

Pike’s essays—“Notes on Programming in C,” “A Lesson in Brevity,” various design rationales—express a philosophy:

Clarity over cleverness
Data structures over algorithms
Composition over inheritance (before OOP made this controversial)

Gaius aspires to this clarity: a small set of concepts (grid, overlays, modes, commands) that compose into powerful workflows.

OODA Loop

Boyd’s OODA (Observe-Orient-Decide-Act) loop describes competitive decision-making under uncertainty. Gaius is explicitly designed to accelerate each phase.

The Loop in Gaius

Observe

The grid displays current system state at a glance. Persistent homology overlays reveal structural features (H1 death loops mark feedback cycles). Curvature heatmaps show where topic clusters have clear boundaries (positive κ) versus ambiguous transitions (negative κ). Agent positions indicate where each analytical lens is focused.

Tools: Grid view, /health, /gpu status, overlay modes (o), MiniGrid Iso views

Orient

Context-building by cycling overlays without leaving your position. The topology overlay reveals structural relationships. The geometry overlay shows curvature. The dynamics overlay shows gradient flow direction. Each overlay reframes the same data — the same position means something different in each frame.

The MiniGrids provide three orthographic projections centered on the cursor, updated with each movement. Scalar fields (curvature κ, persistence π, complexity σ, boundary β) are interpolated via IDW and rendered as elevation maps.

Tools: Overlay cycling (o), /search, /sitrep, MiniGrid projections, Iso mode cycling

Decide

Slash commands, domain changes, and focus actions translate understanding into intent. Tenuki (t) jumps the cursor to a strategically significant position — borrowed from Go, where playing away from the current fight is sometimes the strongest move.

Tools: Command input (/), tenuki (t), mode cycling (v)

Act

Execute decisions: run analysis, apply fixes, export insights, trigger evolution.

Tools: /health fix, /evolve trigger, /render, /swarm

Fast OODA Wins

The competitive advantage of OODA comes from cycle speed. Gaius minimizes latency at every stage:

Observe: Grid renders state instantly (no loading, no scrolling)
Orient: Overlays toggle without delay (pre-computed)
Decide: Keyboard-first eliminates mouse targeting time
Act: Engine RPCs execute in <30s (most <1s)

OODA for Autonomous Agents

The same loop applies to Gaius’s autonomous systems:

Phase	Health Observer	Evolution Daemon
Observe	Health checks	GPU utilization monitoring
Orient	FMEA risk scoring	Agent performance evaluation
Decide	Tier selection (0/1/2)	Candidate ranking
Act	Remediation or escalation	Promote or discard

Fail Open Supports Observation

The fail open principle directly supports the Observe phase: by surfacing unknown states rather than hiding them, it ensures the OODA loop always has complete visibility.

Infrastructure

Gaius runs on a local development infrastructure managed by devenv (Nix-based), with process-compose for service orchestration and Just for task running.

Components

Component	Purpose	Port	Management
devenv	Nix-based development environment	—	`devenv shell`
process-compose	Service orchestration	—	`devenv processes up/down`
Just	Task runner (recipes)	—	`just <recipe>`
PostgreSQL	Primary database (zndx_gaius)	5444	devenv process
Qdrant	Vector store (768-dim embeddings)	6333/6334	devenv process
Aeron	IPC transport (engine ↔ worker)	—	devenv process
NiFi	Data ingestion (Brave, arXiv, RSS)	8443	devenv process
Metabase	Analytics dashboards	3000	devenv process
Gaius Engine	gRPC daemon (37 services)	50051	devenv process
Prometheus	Time-series metric storage	9090	devenv process
OTel Collector	Telemetry pipeline	4317	devenv process

Quick Start

# Enter development environment
devenv shell

# Start all services
devenv processes up

# Or clean restart (preferred)
just restart-clean

# Check status
uv run gaius-cli --cmd "/health" --format json

Architecture

devenv.nix is a pure service declaration file (~470 lines). It defines packages, environment variables, service configurations, and process dependency graphs. All process startup bash lives in scripts/processes/*.sh.

See:

devenv Environment — Nix configuration details
Process Scripts — Startup script architecture
Just Task Runner — Available recipes

devenv Environment

Gaius uses devenv for a Nix-based development environment that provides all system dependencies reproducibly.

Structure

devenv.nix is a pure service declaration file. It defines:

Packages: System tools (kubectl, k9s, mdbook, etc.) provided by Nix
Environment variables: DATABASE_URL, PGPORT, KUBECONFIG, etc.
Process definitions: One-liner exec blocks pointing to scripts
Dependency graphs: Process startup ordering via depends_on
enterShell: Interactive shell setup (PATH, aliases, KUBECONFIG)

Key Design Rules

No Inline Bash

All process startup bash lives in scripts/processes/*.sh. The devenv.nix exec blocks are one-liners:

processes.gaius-engine = {
  exec = ''
    exec ${config.devenv.root}/scripts/processes/gaius-engine.sh
  '';
};

Nix Store Paths as Env Vars

When a script needs Nix-managed binaries, pass them as environment variables:

processes.nifi = {
  exec = ''
    export NIFI_PACKAGE="${pkgs.nifi}"
    exec ${config.devenv.root}/scripts/processes/nifi.sh
  '';
};

KUBECONFIG Handling

enterShell only runs for interactive shells. Process scripts must set KUBECONFIG unconditionally from $HOME:

export KUBECONFIG="$HOME/.config/kube/rke2.yaml"

Never use fallback syntax (${KUBECONFIG:-...}) — the system KUBECONFIG may point to a root-owned path.

Environment Variables

Variable	Value	Source
`DATABASE_URL`	`postgres://gaius:gaius@localhost:5444/zndx_gaius`	devenv.nix
`PGPORT`	`5444`	devenv.nix
`KUBECONFIG`	`~/.config/kube/rke2.yaml`	enterShell
`METAFLOW_SERVICE_URL`	`http://localhost:30180`	enterShell

Nix-Managed Tools

kubectl and k9s are provided by Nix (not the system RKE2 binary). This ensures version consistency across environments.

Process Scripts

All process startup bash lives in scripts/processes/*.sh. Shared helpers are in scripts/lib/.

Process Scripts

Script	Service	Dependencies
`aeron-driver.sh`	Aeron IPC transport	None
`gaius-engine.sh`	gRPC engine daemon	Aeron, PostgreSQL
`gaius-worker.sh`	Background worker	Engine
`metabase.sh`	Analytics dashboards	PostgreSQL
`metaflow-bootstrap.sh`	Metaflow K8s setup	Kubernetes
`metaflow-db-setup.sh`	Metaflow database	PostgreSQL
`metaflow-port-forwards.sh`	K8s port forwarding	Kubernetes
`metaflow-ui.sh`	Metaflow UI	Metaflow service
`nifi.sh`	Data ingestion	PostgreSQL

Shared Helpers

`scripts/lib/process-helpers.sh`

Common functions used by all process scripts:

Function	Purpose
`banner`	Print startup banner with service name
`check_disabled`	Skip if service is disabled via env var
`wait_for_postgres`	Block until PostgreSQL is accepting connections
`wait_for_aeron`	Block until Aeron driver is ready

`scripts/lib/gpu-helpers.sh`

GPU cleanup functions shared by gaius-engine.sh and the justfile:

Function	Purpose
`gpu_cleanup`	Kill orphan vLLM/CUDA processes

Script Pattern

Every process script follows the same structure:

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/../lib/process-helpers.sh"

banner "Service Name"
check_disabled "SERVICE_NAME"

# Wait for dependencies
wait_for_postgres

# Set KUBECONFIG unconditionally (not from enterShell)
export KUBECONFIG="$HOME/.config/kube/rke2.yaml"

# Start the service
exec some-command --flags

Adding a New Process

Create scripts/processes/<name>.sh with the pattern above
Add process block to devenv.nix with one-liner exec
Pass any Nix-only values as env vars in the exec block
Set dependency ordering with process-compose.depends_on

Just Task Runner

Gaius uses Just as its task runner, replacing devenv-tasks which had SQLite locking issues.

Why Just

devenv-tasks 2.0.0 introduced a tasks.db SQLite file that deadlocks when tasks call devenv up. Just is a pure command runner with no state files — it reads justfile and executes recipes.

Key Recipes

just --list              # Show all available recipes

# Core operations
just restart-clean       # Full clean restart (preferred)
just proto-generate      # Regenerate gRPC protobuf bindings

# GPU management
just gpu-cleanup         # Kill orphan vLLM/CUDA processes
just gpu-deep-cleanup    # Aggressive GPU memory cleanup

# Documentation
just docs-build          # Build mdbook documentation

# Kubernetes
just k8s-cleanup         # Clean up K8s resources

restart-clean

The most important recipe. Delegates to scripts/restart-clean.sh:

Stops all devenv processes
Kills stale vLLM/CUDA processes
Cleans up GPU memory
Strips DEVENV_* environment variables (uses env -i)
Restarts everything fresh

Warm start time: ~13 seconds (Nix cached, tasks.db warm).

just restart-clean

Usage

Invoke from the devenv shell (or any shell with just + devenv):

devenv shell
just <recipe>

Recipes are defined in justfile at the project root.

Deployment

Gaius uses a hybrid deployment model: process-compose for the core platform (engine, databases, GPU workloads) and RKE2 Kubernetes for supporting services (Metaflow, infrastructure automation).

Local Platform (process-compose)

The core platform runs locally via devenv process-compose. This includes PostgreSQL, Qdrant, the gRPC engine (37 services), NiFi, Metabase, and the Aeron IPC transport.

devenv processes up    # Start all services
devenv processes down  # Stop all services
just restart-clean     # Clean restart (preferred — ~13s warm start)

Process startup scripts live in scripts/processes/*.sh — each follows a standard pattern with shared helpers for dependency waiting, banner display, and health checks. See Process Scripts for the architecture.

Kubernetes Services (RKE2)

Supporting services run in RKE2 Kubernetes:

Service	Namespace	Purpose	Access
Metaflow metadata	default	Flow run tracking	Port 8180 (forwarded from 8080)
Metaflow UI	default	Web dashboard for pipeline runs	Port 8083

Kubernetes resources are managed via Tilt in infra/tilt/. The kubeconfig is copied from /etc/rancher/rke2/rke2.yaml to ~/.config/kube/rke2.yaml (the RKE2 original is root-owned).

Service Dependency Graph

PostgreSQL ← NiFi, Engine, Metabase, Metaflow DB
Aeron ← Engine, Worker
Engine ← Worker (background tasks)
Kubernetes ← Metaflow Bootstrap, Port Forwards, UI

Process-compose manages the dependency ordering. just restart-clean handles the full lifecycle: stop all services, clean state, restart in dependency order.

Kubernetes

Gaius uses an RKE2 cluster for running Metaflow and supporting services.

Kubeconfig Setup

RKE2 installs its kubeconfig at /etc/rancher/rke2/rke2.yaml (root-owned). Copy it to a user-accessible location:

mkdir -p ~/.config/kube
sudo cp /etc/rancher/rke2/rke2.yaml ~/.config/kube/rke2.yaml
sudo chown $(id -u):$(id -g) ~/.config/kube/rke2.yaml

Set KUBECONFIG:

export KUBECONFIG="$HOME/.config/kube/rke2.yaml"

This is set automatically by enterShell in devenv.nix for interactive shells. Process scripts set it unconditionally.

Nix-Managed Tools

kubectl and k9s are provided by Nix packages in devenv.nix — not the system RKE2 binary. This ensures version consistency.

Pod Networking

K8s pods need pg_hba.conf entries for cluster networks:

host all all 10.42.0.0/16 md5   # Pod network
host all all 10.43.0.0/16 md5   # Service network

Tilt

Development iteration on K8s resources uses Tilt, configured in infra/tilt/.

Cleanup

just k8s-cleanup    # Clean up stale K8s resources

Metaflow Service

The Metaflow metadata service runs in Kubernetes and enables local flow execution with centralized run tracking.

Deployment

Metaflow service is deployed via Tilt in infra/tilt/.

Network Access

The service is exposed via K8s NodePort (30180) for local access. The metaflow-port-forwards.sh process script patches the service to NodePort automatically on startup.

Environment

Set the service URL for flow runs:

export METAFLOW_SERVICE_URL=http://localhost:30180

This is set automatically by enterShell in devenv.nix.

Database

Metaflow uses the same PostgreSQL instance (port 5444) with its own database, set up by the metaflow-db-setup.sh process script.

Bootstrapping

The metaflow-bootstrap.sh script handles initial K8s resource creation for the Metaflow service.

Monitoring

Operational monitoring combines CLI health checks, GPU status tracking, Prometheus metrics, and Metabase dashboards. The Health Observer daemon provides continuous background monitoring with FMEA-scored incident tracking.

Quick Status

# Overall health (all service checks)
uv run gaius-cli --cmd "/health" --format json

# GPU and endpoint status (6x RTX 4090)
uv run gaius-cli --cmd "/gpu status" --format json

# Active incidents with RPN scores
uv run gaius-cli --cmd "/health incidents" --format json

# Prometheus metrics (inference throughput, latency)
uv run gaius-cli --cmd "/observe" --format json

Monitoring Stack

Layer	Tool	What it monitors	Access
Health	`/health`	Service availability, gRPC connectivity	CLI/MCP
Endpoints	`/gpu status`	vLLM endpoints, GPU memory/temp	CLI/MCP
Incidents	Health Observer	Continuous FMEA-scored failure detection	Engine daemon
Metrics	Prometheus	Inference latency p95, throughput, error rates	PromQL (port 9090)
Dashboards	Metabase	Lineage, operations, KB geometry	Web UI (port 3000)
Telemetry	OpenTelemetry	Distributed traces, metric instrumentation	OTel Collector

Key Metrics

Metric	Query Pattern	Alert Threshold
Inference latency p95	`histogram_quantile(0.95, ...)`	> 5000ms
GPU utilization	FLOPS-weighted Welford mean	< 10% (underutilized)
Active incidents	`gaius_gaius_incidents_active`	> 0
FMEA RPN scores	`gaius_gaius_fmea_rpn_score`	> 200 (escalation)
Pipeline backlog	`gaius_gaius_pipeline_pending_cards`	> 50

All rate calculations use 10-minute windows to survive bursty workloads.

TUI ObservePanel

The TUI’s ObservePanel displays real-time metrics with 15-second refresh intervals. Sparklines show 5 minutes of history. Thresholds trigger color changes (green → yellow → red).

Health Checks

The /health command runs diagnostics across all system components and reports status.

Running Health Checks

# All checks
uv run gaius-cli --cmd "/health" --format json

# Specific category
uv run gaius-cli --cmd "/health gpu" --format json
uv run gaius-cli --cmd "/health endpoints" --format json
uv run gaius-cli --cmd "/health infrastructure" --format json

Interpreting Results

Each check reports a status:

Status	Meaning
`PASS`	Component is healthy
`WARN`	Component has issues but is functional
`FAIL`	Component is unhealthy

Applying Fixes

When checks fail, use /health fix:

# Fix a specific service
uv run gaius-cli --cmd "/health fix engine" --format json

# Available services
# engine, dataset, nifi, postgres, qdrant, minio, endpoints, evolution

Always try /health fix before manual intervention. This exercises the self-healing system and helps it improve over time.

Manual Fallback

If /health fix fails:

# Full clean restart
just restart-clean

# GPU-specific cleanup
just gpu-cleanup
just gpu-deep-cleanup

FMEA Diagnostics

For deeper analysis:

# FMEA summary with RPN scores
uv run gaius-cli --cmd "/fmea" --format json

# Failure mode details
uv run gaius-cli --cmd "/fmea detail GPU_001" --format json

GPU Management

Gaius manages 6 NVIDIA RTX 4090 GPUs (24GB VRAM each, 144GB total) across vLLM inference, LuxCore rendering, and embedding workloads.

GPU Allocation

GPU	Typical Use	VRAM	Notes
0-1	Reasoning endpoint (24B model)	2 × 24GB	tensor_parallel=2, CoT reflection
2-3	Coding endpoint (24B model)	2 × 24GB	tensor_parallel=2
4	Embedding endpoint (Nomic 768-dim)	24GB	ColNomic multi-vector
5	Rendering / Evolution	24GB	Dynamically assigned

The Orchestrator manages allocation via capability-based scheduling (OR-Tools CP-SAT). GPUs can be temporarily reassigned for LuxCore rendering or evolution training via makespan scheduling — the Orchestrator evicts a low-priority endpoint, runs the workload, then restores the endpoint.

Status Monitoring

# Endpoint status
uv run gaius-cli --cmd "/gpu status" --format json

# GPU health (memory, temperature, utilization)
uv run gaius-cli --cmd "/gpu health" --format json

Cleanup

When GPU processes get stuck or memory leaks:

# Standard cleanup (kill orphan vLLM processes)
just gpu-cleanup

# Deep cleanup (aggressive memory recovery)
just gpu-deep-cleanup

The gpu-helpers.sh shared library provides the gpu_cleanup function used by both the engine startup script and the justfile recipes.

Common Issues

Issue	Symptom	Fix
Orphan vLLM process	GPU memory used but no endpoint	`just gpu-cleanup`
OOM during model load	Endpoint stuck in STARTING	Free GPU, then `/health fix endpoints`
CUDA memory fragmentation	Degraded inference speed	`just gpu-deep-cleanup` then restart
OpenCV conflict	vLLM WorkerProc fails (cv2 error)	Already fixed via pyproject.toml override

Rendering GPU Eviction

The viz pipeline temporarily evicts a low-priority endpoint to use a GPU for LuxCore rendering:

Orchestrator evicts endpoint from target GPU
LuxCore renders using PATHOCL engine with CUDA
clear_embeddings() releases Nomic model (~3GB)
Orchestrator restores evicted endpoint

Contributing

Gaius is an experiment in augmented cognition. Contributions that advance this vision are welcome.

Development Setup

# Clone and enter
git clone https://github.com/zndx/gaius.git
cd gaius

# Start devenv (provides all system dependencies)
devenv shell

# Install Python dependencies
uv sync

# Start all platform components
devenv processes up

# Or use a clean restart
just restart-clean

# Run the TUI
uv run gaius

# Run the CLI
uv run gaius-cli --cmd "/health" --format json

Project Structure

gaius/
├── src/gaius/          # Python source (26 packages)
│   ├── app.py          # TUI application
│   ├── cli.py          # Non-interactive CLI
│   ├── mcp_server.py   # MCP server (163 tools)
│   ├── engine/         # gRPC engine (37 services)
│   ├── health/         # Self-healing infrastructure
│   ├── agents/         # Agent system
│   └── ...
├── scripts/
│   ├── processes/      # Process startup scripts
│   └── lib/            # Shared helpers
├── docs/current/       # mdbook documentation
├── config/             # HOCON configuration
├── justfile            # Task runner recipes
├── devenv.nix          # Development environment
├── pyproject.toml      # Python dependencies
└── CLAUDE.md           # Development guidelines

Development Workflow

Testing Changes

The CLI is the product. After every code change, verify via CLI:

# After editing code — always re-test
uv run gaius-cli --cmd "/health" --format json

Previous test outputs are invalidated by code changes. Don’t reason from stale context — run the command again.

Key Recipes

just --list              # Show all available tasks
just restart-clean       # Full clean restart
just proto-generate      # Regenerate gRPC bindings
just gpu-cleanup         # Clean up GPU processes
just docs-build          # Build documentation

Design Principles

When contributing, these principles are mandatory:

Fail-fast: Errors surface immediately with guru codes and remediation hints. No silent degradation.
Engine-first: Business logic belongs in engine services, not in interfaces.
Self-healing first: Prefer /health fix over manual remediation.
Keyboard-first: Every operation available via keyboard.
CLI verification: All new features must be testable via gaius-cli.

Code Style

Python 3.12+ features welcome
Type hints for public interfaces
Local imports inside functions for lazy loading in service modules
Use from gaius.core.config import get_database_url for DB URL (never hardcode)

Commit Messages

Use conventional commit style:

feat: add temporal overlay mode
fix: correct grid boundary check
docs: expand TDA explanation
refactor: simplify swarm initialization

Pull Request Process

Create a feature branch
Make changes with clear commits
Verify via CLI: uv run gaius-cli --cmd "/health" --format json
Ensure cd docs/current && mdbook build succeeds if docs changed
Submit PR with description of changes

Architecture Decision Records

Key architectural decisions that shaped the system.

ADR-001: Engine-First Architecture

Context: Business logic was scattered across TUI, CLI, and utility scripts, causing duplication and inconsistency.

Decision: Centralize all business logic in the gRPC engine. TUI, CLI, and MCP become thin clients.

Consequences: Single source of truth for all operations. Engine manages GPU resources centrally. All interfaces get consistent behavior automatically.

ADR-002: Just Over devenv-tasks

Context: devenv-tasks 2.0.0 introduced SQLite locking on tasks.db that deadlocks when tasks call devenv up.

Decision: Migrate from devenv-tasks to Just as the task runner.

Consequences: Pure command runner with no state files. Recipes defined in justfile. No locking issues. scripts/restart-clean.sh still does actual work; Just recipe delegates to it.

ADR-003: Fail-Fast as Iron-Clad Principle

Context: Silent degradation hid problems until they became critical.

Decision: All code must surface errors immediately with guru meditation codes and remediation paths. No silent fallbacks.

Consequences: Higher initial friction (more explicit error handling) but dramatically faster diagnosis and resolution. Self-healing system built on reliable error detection.

ADR-004: FMEA for Health Monitoring

Context: Simple severity classifications don’t capture risk adequately — a rare but invisible failure is more dangerous than a frequent but obvious one.

Decision: Adopt FMEA (Failure Mode and Effects Analysis) with RPN scoring for health monitoring.

Consequences: Quantitative risk assessment (S x O x D). Tiered remediation based on risk level. Adaptive learning from outcomes.

ADR-005: LuxCore Over Blender for Visualization

Context: Blender’s Cycles renderer couldn’t render glass convincingly (opaque white blobs).

Decision: Use LuxCore unbiased path tracer for card visualization, initially via PyPI, later from source for GPU acceleration.

Consequences: Physically accurate glass rendering. GPU-accelerated via PATHOCL engine with CUDA. More complex build process but superior visual quality.

ADR-006: Process Scripts Over Inline Nix Bash

Context: devenv.nix contained inline bash blocks that were hard to debug and test.

Decision: Move all process startup bash to scripts/processes/*.sh with shared helpers. devenv.nix becomes a pure service declaration file with one-liner exec blocks.

Consequences: Scripts are independently testable. Shared helpers eliminate duplication. Nix store paths passed as environment variables.

Adding New ADRs

When making significant architectural decisions:

Add an entry here with Context, Decision, and Consequences
Reference the ADR in relevant code comments
Update CLAUDE.md if the decision affects development workflow

Proto Change Workflow

Changes to the gRPC protobuf schema require a specific workflow to keep generated bindings, internal enums, and status mappings in sync.

Step-by-Step

1. Edit the Proto File

Edit src/gaius/engine/proto/gaius_service.proto. Append new enum values — don’t renumber existing values for wire compatibility.

2. Regenerate Bindings

just proto-generate

This generates gaius_service_pb2.py and gaius_service_pb2_grpc.py.

3. Update Generated Exports

Add new symbols to src/gaius/engine/generated/__init__.py:

Add to the import block
Add to the __all__ list

Critical: Skipping this causes engine startup failures.

4. Update Internal Enums

If there’s a parallel Python enum (e.g., in vllm_controller.py), sync it with the proto enum.

5. Update Status Mappings

Add string-to-proto mappings in the servicer’s _STATUS_MAP.

6. Verify

uv run python -c "from gaius.engine.generated import NEW_SYMBOL; print('OK')"

7. Restart and Test

just restart-clean
uv run gaius-cli --cmd "/gpu status" --format json

Common Mistakes

Symptom	Cause	Fix
Engine fails to start	Missing export in `__init__.py`	Add symbol to imports and `__all__`
Port 50051 not listening	Import error in gRPC server	Check engine logs
Status shows wrong value	Missing `_STATUS_MAP` entry	Add mapping

See Protobuf Schema for more detail.

Testing

Gaius follows a CLI-first testing methodology. The CLI is the product — all functionality must be verifiable through it.

Core Rules

1. Re-Test After Every Code Change

Previous test outputs are invalidated by code changes. Don’t reason from stale context — run the command again.

# After editing code:
# BAD: "The fix should work based on my analysis"
# GOOD: Actually run it
uv run gaius-cli --cmd "/evolve status" --format json

2. No Static Test Data

We do not fall back to static test data. All functional aspects of new features must be verified directly against running services.

3. No Fallback Workarounds

Do not rely on fallbacks or workarounds when testing. If a service is down, fix it (via /health fix) rather than mocking around it.

Verification Patterns

Health Check

uv run gaius-cli --cmd "/health" --format json

Endpoint Status

uv run gaius-cli --cmd "/gpu status" --format json | jq '.data.endpoints[] | {name, status}'

Evolution Status

uv run gaius-cli --cmd "/evolve status" --format json

Import Verification

For new modules or proto changes:

uv run python -c "from gaius.engine.generated import NewSymbol; print('OK')"

TUI Testing

TUI behavior must be tested using Textual Pilot before committing:

async with app.run_test() as pilot:
    await pilot.press("h")  # Navigate left
    assert app.cursor_x == expected_x

Fail-Fast Compliance

Before committing, verify:

# No fallback patterns
grep -rn "fail_fast\|SELENIUM_AVAILABLE\)" src/gaius/
# No placeholder image colors
grep -rn "240, 240, 240" src/gaius/

All error messages must include guru meditation codes and remediation hints.

CLI Commands

63 slash commands are available in both the TUI and CLI interfaces. Commands are executed via:

# TUI: press / then type command
/health

# CLI: use --cmd flag
uv run gaius-cli --cmd "/health" --format json

Command Categories

Health & Diagnostics

Command	Description
`/health`	Run all health checks
`/health <category>`	Run checks for category (gpu, endpoints, infrastructure)
`/health fix <service>`	Apply automated fix strategy
`/health observer`	Health Observer daemon status
`/health incidents`	List active incidents
`/fmea`	FMEA summary with RPN scores
`/fmea catalog`	List all failure modes
`/fmea detail <id>`	Failure mode details

GPU & Endpoints

Command	Description
`/gpu status`	Endpoint and GPU status
`/gpu health`	GPU memory, temperature, utilization

Agents & Evolution

Command	Description
`/swarm`	Run swarm analysis
`/evolve status`	Evolution daemon status
`/evolve trigger`	Trigger evolution cycle
`/cognition`	Trigger cognition cycle
`/thoughts`	View recent thoughts
`/sitrep`	Situational report
`/theta consolidate`	Run theta consolidation

Knowledge Base

Command	Description
`/search <query>`	Search knowledge base
`/kb list`	List KB entries
`/kb create`	Create KB entry

System

Command	Description
`/state`	Current application state
`/render`	Render card visualizations
`/xai budget`	XAI API budget status

Note: This is a representative subset. Run /help in the TUI for the complete list, or see the dispatch table in src/gaius/cli.py.

MCP Tools

163 MCP tools expose Gaius functionality to Claude Code and other MCP-compatible clients.

Tool Categories

Category	Count	Description
Health	~20	Health checks, FMEA, observer, incidents
Agents	~15	Swarm, evolution, cognition, theta
Inference	~10	Scheduler, endpoints, GPU status
Knowledge Base	~15	Search, CRUD, sync, semantic search
Observability	~10	Metrics, Prometheus, status
Data Pipeline	~10	Metaflow, lineage, flows
Visualization	~5	Render, card management
Bases	~10	Feature store queries, entity history
Collections	~15	Card collections, publishing
Articles	~5	Article curation, status
X Bookmarks	~8	Sync, auth, folders
Calibration	~5	Understanding calibration
Evolution	~10	Agent versions, optimization
System	~25	Config, models, sessions, research

Naming Convention

Tools follow a consistent naming pattern: <domain>_<action> (e.g., health_observer_status, scheduler_submit, gpu_health).

Example Usage

From Claude Code:

> Use the health_observer_status tool to check system health
> Use the gpu_health tool to check GPU memory usage
> Use the search_kb tool to find articles about pensions

Server Configuration

{
  "mcpServers": {
    "gaius": {
      "command": "uv",
      "args": ["run", "gaius-mcp"],
      "cwd": "/path/to/gaius"
    }
  }
}

Note: For the complete tool list with parameters, see src/gaius/mcp_server.py.

Guru Meditation Codes

Complete catalog of error codes used across the Gaius platform.

Format

#<COMPONENT>.<SEQUENCE>.<MNEMONIC>

Catalog

DS — DatasetService

Code	Description	Fix
`#DS.00000001.SVCNOTINIT`	DatasetService not initialized	`/health fix dataset`

NF — NiFi

Code	Description	Fix
`#NF.00000001.UNREACHABLE`	NiFi not reachable	`/health fix nifi`

EN — Engine

Code	Description	Fix
`#EN.00001.GRPC_BIND`	gRPC port bind failure	Check port 50051
`#EN.00002.VLLM_START`	vLLM startup failure	`/health fix endpoints`
`#EN.00003.GPU_OOM`	GPU out of memory	`just gpu-cleanup`
`#EN.00004.ORPHAN_PROC`	Orphan vLLM process	`just gpu-cleanup`

EP — Endpoints/Inference

Code	Description	Fix
`#EP.00000001.GPUOOM`	GPU out of memory during inference	`/health fix endpoints`

GR — gRPC

Code	Description	Fix
`#GR.00000001.CONNFAIL`	gRPC connection failed	Check engine status

ACP — Agent Client Protocol

Code	Description	Fix
`#ACP.00000001.CONNFAIL`	ACP connection failed	Check Mistral Vibe
`#ACP.00000002.TIMEOUT`	ACP connection timeout	Retry
`#ACP.SEC.00000002.NOTALLOWED`	Repo not in allowlist	Update acp.conf
`#ACP.SEC.00000003.NOTPRIVATE`	Repo not private	Make repo private

ACF — Article Curation Flow

Code	Description	Fix
`#ACF.00000013.NOHINTS`	Empty keywords in article frontmatter	Add keywords/news_queries

XB — X Bookmarks

Code	Description	Fix
`#XB.00000001.NOTOKEN`	No auth token	Complete OAuth flow
`#XB.00000011.NOFOLDER`	Folders API unavailable (403)	Upgrade API tier

HL — Health

Code	Description	Fix
`#HL.00001.GRPC_DOWN`	gRPC connection down	`just restart-clean`
`#HL.00002.GPU_OOM`	GPU memory exhausted	`just gpu-cleanup`

Note: This is a representative subset. Guru codes are assigned as new failure modes are identified. See CLAUDE.md for the full format specification.

Database Schema

PostgreSQL database zndx_gaius on port 5444.

Connection

PGPASSWORD=gaius psql -h localhost -p 5444 -U gaius -d zndx_gaius

Connection URL: postgres://gaius:gaius@localhost:5444/zndx_gaius?sslmode=disable

Important: The database name is zndx_gaius, not gaius.

Key Tables

Cards & Content

Table	Purpose
`cards`	Card entities with metadata
`card_enrichments`	Enrichment data for cards
`articles`	Source articles
`article_content`	Article text content

FMEA & Health

Table	Purpose
`fmea_catalog`	Failure mode definitions (S/O/D scores)
`fmea_occurrences`	Failure occurrence history
`fmea_outcomes`	Remediation outcomes (for adaptive learning)
`fmea_approvals`	Pending Tier 2 approvals
`healing_events`	Self-healing audit trail
`health_observer_state`	Observer daemon state

Agents & Evolution

Table	Purpose
`agent_versions`	Agent prompt versions with lineage
`agent_evaluations`	Evaluation results for evolution

Operations

Table	Purpose
`activity_log`	System activity tracking
`x_bookmarks`	X bookmark sync data
`x_bookmark_folders`	X bookmark folder metadata
`x_sync_runs`	X sync run history

Accessing from Code

Always use the config helper:

from gaius.core.config import get_database_url

url = get_database_url()

Never hardcode connection parameters.

Configuration

Gaius uses HOCON configuration files with environment variable overrides. The canonical source is config/base.conf.

Configuration Hierarchy

config/base.conf — Default values
~/.config/gaius/acp.conf — ACP-specific overrides
Environment variables — Highest priority

Database

Key	Default	Env Var	Description
`database.host`	`localhost`	`PGHOST`	PostgreSQL host
`database.port`	`5444`	`PGPORT`	PostgreSQL port
`database.name`	`zndx_gaius`	`PGDATABASE`	Database name
`database.user`	`gaius`	`PGUSER`	Database user
`database.password`	`gaius`	`PGPASSWORD`	Database password

Important: Always use gaius.core.config.get_database_url() to get the connection URL. Never hardcode.

Engine

Key	Default	Env Var	Description
`engine.grpc.host`	`0.0.0.0`	`GAIUS_ENGINE_HOST`	gRPC bind host
`engine.grpc.port`	`50051`	`GAIUS_ENGINE_PORT`	gRPC port
`engine.grpc.max_workers`	`10`	—	Max gRPC worker threads
`engine.orchestrator.preload_endpoints`	`["reasoning"]`	—	Endpoints to load on startup
`engine.orchestrator.startup_timeout`	`600`	—	Startup timeout (seconds)
`engine.scheduler.default_timeout`	`120`	`GAIUS_ENGINE_TIMEOUT`	Default inference timeout
`engine.evolution.enabled`	`true`	—	Enable evolution daemon
`engine.evolution.idle_threshold`	`60`	—	GPU idle seconds before evolution

Health

Key	Default	Description
`health.check_interval`	`60`	Health Observer poll interval (seconds)
`health.fmea.learning_rate`	`0.2`	Adaptive S/O/D learning rate
`health.self_healing.enabled`	`true`	Enable automatic remediation

Agents

Key	Default	Description
`agents.swarm.parallel`	`true`	Enable parallel swarm execution
`agents.swarm.timeout`	`60`	Swarm execution timeout (seconds)
`agents.theta.confidence_threshold`	`0.8`	Theta consolidation confidence threshold

ACP Security

Configured in ~/.config/gaius/acp.conf:

acp {
  github {
    allowed_repos = ["zndx/gaius-acp"]
    require_private = true
    verify_on_each_operation = true
    cache_visibility_seconds = 300
  }
}

Glossary

ACP — Agent Client Protocol. Integration layer for Mistral Vibe to perform autonomous health maintenance.

AgendaTracker — Tracks scheduled endpoint transitions for makespan operations, preventing false-positive health incidents.

APO — Automatic Prompt Optimization. Technique for evolving agent system prompts.

Bases — Feature store for entity-centric data with temporal queries.

CLT — Cross-Layer Transcoder. Extracts sparse interpretable features from model activations.

Death Loop — An H1 topological feature (persistent cycle) in embedding space. Indicates feedback loops, circular dependencies, or systemic risk.

devenv — Nix-based development environment providing reproducible builds.

DQL — Domain Query Language. Query syntax for the Bases feature store.

FMEA — Failure Mode and Effects Analysis. Quantitative risk assessment using RPN scoring.

Guru Meditation Code — Unique error identifier (e.g., #DS.00000001.SVCNOTINIT). Inspired by the Amiga.

H0/H1/H2 — Homology dimensions: H0 = connected components, H1 = loops, H2 = voids.

HOCON — Human-Optimized Config Object Notation. Configuration file format used by Gaius.

Just — Command runner replacing devenv-tasks. Reads recipes from justfile.

KServe OIP — Open Inference Protocol. Standard gRPC interface for ML inference.

LuxCore — Unbiased path tracing renderer used for card visualizations.

Makespan — Total time from start to finish of a multi-step workload (eviction, loading, inference, restoration).

MCP — Model Context Protocol. Programmatic interface exposing 163 tools to AI assistants.

optillm — Inference-time reasoning enhancement (CoT, BoN, MoA techniques).

PATHOCL — LuxCore rendering engine using OpenCL/CUDA for GPU acceleration.

RASE — Rapid Agentic Systems Engineering. Python-native MBSE metamodel for verifiable agent training.

RLVR — Reinforcement Learning with Verifiable Reward. Agent training methodology.

RPN — Risk Priority Number. FMEA score: Severity x Occurrence x Detection (range 1-1000).

Tenuki — Go term for playing away from the current area. In Gaius, jumps the cursor to a strategic point.

Theta Consolidation — Memory compression inspired by hippocampal theta rhythms. Links knowledge across temporal slices.

TUI — Terminal User Interface. Interactive Textual application launched with uv run gaius.

vLLM — High-throughput LLM inference engine. Managed by the Orchestrator across 6 GPUs.

ACP Incident Resolution: 2026-01-01

A milestone in autonomous self-healing: Claude Code resolves GPU allocation conflicts using Gaius MCP tools.

Overview

On January 1, 2026, the Gaius HealthObserver daemon detected GPU memory exhaustion and escalated to Claude Code via the Agent Client Protocol (ACP). This document captures the complete investigation and resolution session, demonstrating the first successful end-to-end ACP escalation workflow.

Key Achievements

Autonomous Root Cause Analysis: Claude Code identified overlapping GPU allocations between multiple endpoints
MCP Tool Integration: Used gpu_status, health_check, and orchestrator commands via gRPC
Self-Healing Validation: Monitored recovery and verified resolution through the same tooling
Multiple Incident Cascade: Handled a follow-on incident caused by the same GPU contention pattern

Incident 1: GPU_001:gpu_1_health

 ▐▛███▜▌   Claude Code v2.0.54
▝▜█████▛▘  Opus 4.5 · Claude Max
  ▘▘ ▝▝    ~/local/src/zndx/gaius

Health Incident Context

Fingerprint: GPU_001:gpu_1_health Endpoint: gpu_1_health Failure Mode: GPU_001 RPN Score: 125 (S:5 × O:5 × D:5) Escalation Tier: 2 Attempts: 3

Initial State

GPU	Memory Used	Memory %	Healthy
0	1.29 GB	5.4%	✓
1	23.6 GB	98.4%	✗
2	22.89 GB	95.4%	✓
3	22.89 GB	95.4%	✓
4	22.89 GB	95.4%	✓
5	0.56 GB	2.3%	✓

Endpoints at Incident Time

Endpoint	Model	GPUs	Status	Issue
cap_reasoning	Qwen/QwQ-32B	[1,2,3,4]	healthy	Consuming GPU 1
orchestrator	nvidia/Orchestrator-8B	[0,1]	starting	“Insufficient GPU memory”

Root Cause Analysis

Claude Code identified the issue immediately:

Root Cause: GPU allocation conflict between two endpoints trying to use GPU 1:

cap_reasoning (Qwen/QwQ-32B) - Uses GPUs [1, 2, 3, 4], status: healthy

orchestrator (nvidia/Orchestrator-8B) - Uses GPUs [0, 1], status: starting with “Insufficient GPU memory”

The cap_reasoning endpoint loaded first and consumed GPU 1’s memory (~23.6 GB of 24 GB). When orchestrator tries to start on GPUs [0, 1], GPU 0 has memory available (1.29 GB used) but GPU 1 is already exhausted.

Investigation Commands

# Check endpoint status
uv run gaius-cli --cmd "/gpu status" --format json

# Stop the stuck orchestrator
uv run gaius-cli --cmd "/gpu stop orchestrator" --format json

# Monitor recovery
uv run gaius-cli --cmd "/health gpu" --format json

Resolution

After stopping the orchestrator endpoint, the scheduler automatically rebalanced:

GPU	Before	After
0	5.4%	0.01%
1	98.4%	0.01% ✓
2	95.4%	92.2%
3	95.4%	94.3%
4	95.4%	94.3%
5	2.3%	0.01%

Final State:

orchestrator: HEALTHY (port 8094)
coding: HEALTHY (port 8093)
cap_reasoning: STOPPING
fast: STARTING (port 8095)

Incident 2: VLLM_001:coding

Immediately after resolving the first incident, a second cascaded incident appeared.

Health Incident Context

Fingerprint: VLLM_001:coding Endpoint: coding Failure Mode: VLLM_001 RPN Score: 125 (S:5 × O:5 × D:5) Escalation Tier: 2 Attempts: 3

Conflict Analysis

Endpoint	GPUs	Status	Memory on GPU 1
cap_reasoning	[1,2,3,4]	healthy	22.89 GB (95.4%)
orchestrator	[0,1]	stopping	competing
coding	[1]	failed	can’t allocate
fast	[0]	healthy	-

Resolution

The scheduler handled this automatically:

Stopped cap_reasoning to free GPUs [1,2,3,4]
Stopped orchestrator and coding
Cleared all GPU memory (95%+ → 0%)
Restarted endpoints with non-overlapping allocations

Final State:

Endpoint	Status
orchestrator	HEALTHY ✓
coding	HEALTHY ✓
reasoning	STOPPING
fast	STARTING

Observations

What Worked

FMEA-Based Escalation: RPN scoring correctly identified severity (125 = S:5 × O:5 × D:5)
MCP Tool Chain: All diagnostic commands worked through gRPC proxying
Scheduler Self-Healing: Automatic GPU reallocation after conflicts cleared
Cascading Incident Detection: Second incident properly tracked with separate fingerprint

Identified Gaps

GPU Overlap Detection: Scheduler allowed conflicting GPU assignments (cap_reasoning and orchestrator both claimed GPU 1)
Startup Ordering: No precedence constraints ensured larger models claim GPUs first
Runtime Validation: GPU allocations only validated at scheduling time, not continuously

Order 3+ RCA Observations

These connect to CP-SAT constraints in makespan_scheduler.py:

Constraint	Gap Identified
`GPU_MUTUAL_EXCLUSION`	Enforced at planning time, not at runtime
`CONTIGUITY_REQUIREMENT`	TP endpoints need contiguous GPU blocks
`PRECEDENCE`	Large models should claim GPUs before small ones

Significance

This incident represents a milestone in Gaius’s self-healing capabilities:

First Successful ACP Escalation: HealthObserver → Claude Code → MCP tools → Resolution
Closed-Loop Verification: Claude Code verified resolution using same tools that detected the issue
RCA Framework Validation: Order 3+ observations identified scheduler constraint gaps
Multi-Incident Handling: Cascading incidents tracked and resolved in sequence

The GPU allocation conflict exposed architectural issues that led to the RCA (Root Cause Analysis) framework development, enabling future incidents to be classified as OPERATIONAL (transient) or ARCHITECTURAL (needs code fix).

Captured from ACP session on 2026-01-01 04:11-04:45 UTC

Changelog

Notable changes and milestones in Gaius development.

2026-03 (Current)

Bases feature store with DQL query language
Card publishing gates on enrichment completeness
Content Freshness health check
KV coherence health check
Per-type watchdog timeouts for scheduled tasks

2026-02

LuxCore GPU rendering (PATHOCL engine with CUDA)
Just task runner (replacing devenv-tasks)
Process script architecture (no inline bash in devenv.nix)
FMEA health framework with adaptive learning
Article curation flow with Brave search
X Bookmarks sync with folder-first strategy
OpenCV/vLLM dependency conflict resolution

2026-01

ACP (Agent Client Protocol) for Mistral Vibe integration
Health Observer daemon with ACP escalation
Guru Meditation Code system
Content sanitization for ACP security

2025-12

gRPC engine with 37 services
Orchestrator with makespan scheduling
Evolution daemon with APO optimization
Cognition service (autonomous thoughts)
Theta consolidation pipeline

2025-11

Initial TUI with 19x19 grid
Persistent homology visualization
Multi-agent swarm execution
mdbook documentation foundation
MiniGrid orthographic projections

Keyboard shortcuts

Gaius