gRPC Engine
The engine is the central nervous system of Gaius. It’s a long-running daemon that manages GPU resources, coordinates services, and exposes all functionality via gRPC on port 50051.
Architecture
┌──────────────────────────────────────────────┐
│ gRPC Server :50051 │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ KServe OIP │ │ Gaius Extensions │ │
│ │ (inference) │ │ (health, evolution, │ │
│ │ │ │ orchestrator, ...) │ │
│ └──────┬───────┘ └──────────┬───────────┘ │
├─────────┼─────────────────────┼──────────────┤
│ │ 37 Services │ │
│ ┌──────┴──────┐ ┌──────────┴───────────┐ │
│ │ Orchestrator │ │ Scheduler │ │
│ │ Evolution │ │ Cognition │ │
│ │ Health │ │ Topology │ │
│ │ CLT │ │ Dataset │ │
│ │ ... │ │ ... │ │
│ └──────┬───────┘ └──────────┬───────────┘ │
├─────────┼─────────────────────┼──────────────┤
│ │ Backend Controllers │ │
│ ┌──────┴──────┐ ┌──────────┴───────────┐ │
│ │ vLLM Ctrl │ │ Embedding Ctrl │ │
│ │ optillm Ctrl│ │ Backend Router │ │
│ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │
│ ┌──────┴─────────────────────┴───────────┐ │
│ │ GPU Pool (6x NVIDIA) │ │
│ └────────────────────────────────────────┘ │
└──────────────────────────────────────────────┘
Startup Sequence
The engine initializes in 9 phases, streaming progress to connected clients:
| Phase | Duration | Action |
|---|---|---|
| INIT | Immediate | InitController starts |
| GRPC | ~1s | gRPC server binds to :50051 |
| TELEMETRY | ~2s | OpenTelemetry setup |
| BACKENDS | ~5s | Backend router initialization |
| ORCHESTRATOR | ~2s | Orchestrator service starts |
| ENDPOINTS | ~240s | vLLM model loading to VRAM |
| TRANSPORT | ~2s | Aeron bridge setup |
| SERVICES | ~5s | Background services start |
| COMPLETE | - | Ready for inference |
The gRPC server starts early (phase 2) so clients can connect immediately and receive real-time progress during the ~4 minute vLLM startup.
Module Structure
engine/
├── server.py # Main daemon entry point
├── config.py # Engine configuration
├── init_controller.py # Initialization progress streaming
├── workloads.py # Workload definitions
├── grpc/
│ ├── server.py # gRPC server setup
│ └── servicers/
│ ├── inference_servicer.py # KServe OIP implementation
│ └── gaius_servicer.py # Gaius extensions
├── backends/
│ ├── backend_router.py # Unified request routing
│ ├── vllm_controller.py # vLLM process management
│ ├── optillm_controller.py
│ └── embedding_controller.py
├── services/ # 37 registered services
├── compute/ # Grid projection, TDA
├── resources/ # GPU allocation
├── transport/ # Aeron bridge
├── generated/ # Protobuf generated code
└── proto/ # Protobuf definitions
gRPC Protocol
The engine implements two gRPC services:
KServe Open Inference Protocol
Standard inference protocol for compatibility with ML platforms:
service GRPCInferenceService {
rpc ServerLive(ServerLiveRequest) returns (ServerLiveResponse);
rpc ServerReady(ServerReadyRequest) returns (ServerReadyResponse);
rpc ModelMetadata(ModelMetadataRequest) returns (ModelMetadataResponse);
rpc ModelInfer(ModelInferRequest) returns (ModelInferResponse);
}
Gaius Extensions
Custom RPCs for Gaius-specific functionality:
service GaiusService {
rpc WatchInit(stream InitRequest) returns (stream InitProgress);
rpc WatchHealth(HealthRequest) returns (stream HealthMetrics);
rpc EvolutionStatus(Empty) returns (EvolutionStatusResponse);
rpc TriggerEvolution(TriggerRequest) returns (TriggerResponse);
rpc GetEndpointStatus(Empty) returns (EndpointStatusResponse);
rpc StartEndpoint(StartRequest) returns (StartResponse);
rpc StopEndpoint(StopRequest) returns (StopResponse);
}
Configuration
engine {
grpc {
host = "0.0.0.0"
port = 50051
max_workers = 10
max_message_size = 104857600 # 100MB
}
orchestrator {
preload_endpoints = ["reasoning"]
startup_timeout = 600 # 10 minutes
health_check_interval = 30
}
scheduler {
max_queue_size = 1000
default_timeout = 120
}
evolution {
enabled = true
idle_threshold = 60
cycle_interval = 3600
}
}
Running the Engine
# Via devenv process-compose (normal operation)
devenv processes up
# Standalone
uv run gaius-engine
# Clean restart (stops everything, cleans up, restarts)
just restart-clean
Verifying Engine Health
# Check if gRPC port is listening
nc -zv localhost 50051
# Check endpoint status
uv run gaius-cli --cmd "/gpu status" --format json
# Watch engine logs
tail -f .devenv/processes.log | grep gaius-engine