Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Scheduler

The SchedulerService provides a priority-based job queue for inference requests with XAI budget management and weighted completion time minimization.

Priority Levels

PriorityWeightUse Case
CRITICAL (0)1.0User-facing interactive requests
HIGH (1)2.0Interactive queries
NORMAL (2)4.0Background processing
LOW (3)8.0Batch operations
EVOLUTION (4)16.0Agent evolution (lowest priority)

Lower weights receive preferential scheduling. Critical requests preempt everything.

Job Submission

from gaius.engine.services import SchedulerService, InferenceJob, JobPriority

scheduler = SchedulerService()

job = InferenceJob(
    prompt="Analyze the risk factors...",
    priority=JobPriority.HIGH,
    max_tokens=2048,
)
result = await scheduler.submit(job)

XAI Budget

The scheduler tracks daily usage of external AI APIs (xAI Grok) to prevent runaway costs:

budget = scheduler.get_xai_budget()
# budget.daily_remaining: tokens left for today
# budget.daily_limit: configured daily cap
# budget.reset_time: when the budget resets

Requests exceeding the budget are rejected with a clear error message.

Makespan Scheduling

For complex workloads that require multiple inference calls (e.g., agent evolution with candidate generation + evaluation), the scheduler uses makespan optimization to minimize total completion time:

  1. Decompose workload into individual inference jobs
  2. Assign priorities based on workload urgency
  3. Schedule across available endpoints
  4. Track completion via the AgendaTracker

See Makespan Scheduling for the optimization details.

Timeouts

ContextDefault Timeout
General gRPC calls30s
Inference (completions)120s
Evaluation120s

A 24B model with cot_reflection takes 15-20 seconds per completion. Timeouts are set per-call:

result = await client.call("ModelInfer", request, timeout=120)

Override the default via GAIUS_ENGINE_TIMEOUT environment variable.