Multi-Agent Architecture
4 Local Orchestrators · Multi-Node Cluster · Claude-Distilled MoE 35B-A3B
🖥️ Local Layer — Orchestration & Reasoning (Claude-Distilled MoE)
#council
#coordinator
#strategist
#technocrat
#generalist
🐾
Coordinator
Planning · Synthesis · Delegation
MoE 35B × 0.25 · ~30GB
🎯
Strategist
Business · ROI · Cost Analysis
MoE 35B × 0.25 · ~30GB
⚙️
Technocrat
Infrastructure · DevOps · Security
MoE 35B × 0.25 · ~30GB
🌟
Generalist
Cross-domain · Docs · Fresh Eyes
MoE 35B × 0.25 · ~30GB
⚡ Agent Gateway — Routing, Auth, Model Selection
Agent Framework Gateway
Routes orchestrators to DGX cluster members · Spawns subagents to local GPUs · Manages sessions & memory
🖥️ Local Layer — Execution & Inference (Your Hardware)
ORCHESTRATOR :8356
Claude-Distilled 27B MoE
ORCHESTRATOR :8356
Claude-Distilled 27B MoE
ORCHESTRATOR :8356
Claude-Distilled 27B MoE
TensorRT-LLM · Nemotron-3-Nano-30B NVFP4
Dedicated: Koda subagents
sub-001 coding
sub-002 research
sub-003 analysis
sub-004 coding
sub-005 review
sub-006 test
sub-007 draft
sub-008 debug
sub-009 refactor
sub-010 deploy
TRT-LLM endpoint
Ollama (embeddings)
systemd managed
TensorRT-LLM · Nemotron-3-Nano-30B NVFP4
Dedicated: Nexus subagents
sub-011 coding
sub-012 research
sub-013 analysis
sub-014 coding
sub-015 review
sub-016 gen
sub-017 data
sub-018 scan
sub-019 migrate
sub-020 audit
TRT-LLM endpoint
Ollama (vision, embed)
load balanced
TensorRT-LLM · Nemotron-3-Nano-30B NVFP4
Dedicated: Atlas + Catalyst subagents
ORCHESTRATOR :8356
Claude-Distilled 27B MoE
vLLM · Qwen3.5-35B-A3B-GPTQ-Int4
Dedicated: CogStack KG operations
⚡
200 GbE InfiniBand RDMA — Multi-node tensor parallelism
⚡
Orchestrators decide WHAT → Spawn subagents on local GPUs → GPUs do the WORK