Agent Swarm Architecture
Aegir’s agent swarm infrastructure enables multi-agent collaboration through RWKV recurrent state fusion. Rather than exchanging text messages or attention KV caches between agents, the swarm shares compact recurrent state tensors – a fundamentally more efficient communication medium for recurrent architectures.
Why RWKV State Sharing
The central insight is that RWKV’s recurrent state is constant in sequence length. Each layer’s state is a matrix of shape (H, K, V) where H is the number of heads and K = V = head_size. The total state size per layer is:
O(H * head_size^2) = O(d_model * head_size) = O(d^2)
This is independent of how many tokens the agent has processed.
For a swarm of N agents, the cost of sharing all recurrent states is:
RWKV: O(N * d^2) -- constant in sequence length
Transformer: O(N * n * d) -- linear in sequence length n
At context lengths of 4k-128k tokens with typical d = 512-4096, RWKV state sharing is orders of magnitude cheaper. The LatentMAS paper (arXiv:2511.20639) quantifies this as 235-471x more information-dense than text-based inter-agent communication, since the recurrent state encodes a compressed summary of the entire processing history.
Swarm Components
The swarm consists of four modules:
| Module | File | Purpose |
|---|---|---|
RWKVStateFusion | src/aegir/swarm/state_fusion.py | Combine N agent states into one |
AlignmentProjection | src/aegir/swarm/alignment.py | Map states between different-sized agents |
FrozenSpecialist | src/aegir/swarm/specialist.py | Wrap pre-trained models as frozen agents |
SwarmOrchestrator | src/aegir/swarm/orchestrator.py | K2.5 PARL routing and reward |
State Fusion Modes
RWKVStateFusion supports three strategies for combining agent states:
-
weighted_sum– Attention-weighted combination using learnable query/key projections. The orchestrator learns which agents to trust per head. -
gated– Per-agent softmax gates. Simpler than attention but still differentiable. Good baseline for initial experiments. -
concat_project– Concatenate all agent states and project back to single-agent dimensions. Most expressive butO(N)in parameter count.
See RWKV State Fusion for mathematical details.
Information Density Advantage
LatentMAS demonstrates that recurrent state communication dramatically outperforms text-based multi-agent protocols. The recurrent state is a lossy but highly compressed representation of the agent’s entire context window. Sharing it is equivalent to sharing a continuous-valued “summary” that preserves the information most relevant to the model’s computation, rather than forcing that information through a text bottleneck.
For Aegir’s column annotation task, this means a specialist trained on (say) geographic column types can share its accumulated understanding of a table’s structure through a single (H, K, V) tensor per layer, rather than generating and parsing natural language explanations.