Multi-Agent Architecture

4 Local Orchestrators · Multi-Node Cluster · Claude-Distilled MoE 35B-A3B
🖥️ Local Layer — Orchestration & Reasoning (Claude-Distilled MoE)
👤
Human Operator
#council #coordinator #strategist #technocrat #generalist
🐾
Coordinator
Planning · Synthesis · Delegation
MoE 35B × 0.25 · ~30GB
🎯
Strategist
Business · ROI · Cost Analysis
MoE 35B × 0.25 · ~30GB
⚙️
Technocrat
Infrastructure · DevOps · Security
MoE 35B × 0.25 · ~30GB
🌟
Generalist
Cross-domain · Docs · Fresh Eyes
MoE 35B × 0.25 · ~30GB
⚡ Agent Gateway — Routing, Auth, Model Selection
Agent Framework Gateway
Routes orchestrators to DGX cluster members · Spawns subagents to local GPUs · Manages sessions & memory
🖥️ Local Layer — Execution & Inference (Your Hardware)
🟢
spark (10.1.2.150)
GB10 Blackwell · 128 GB VRAM · ARM64
:8355
ORCHESTRATOR :8356
Claude-Distilled 27B MoE
ORCHESTRATOR :8356
Claude-Distilled 27B MoE
ORCHESTRATOR :8356
Claude-Distilled 27B MoE
TensorRT-LLM · Nemotron-3-Nano-30B NVFP4
Dedicated: Koda subagents
sub-001 coding
sub-002 research
sub-003 analysis
sub-004 coding
sub-005 review
sub-006 test
sub-007 draft
sub-008 debug
sub-009 refactor
sub-010 deploy
TRT-LLM endpoint Ollama (embeddings) systemd managed
🟢
dark (10.1.2.155)
GB10 Blackwell · 128 GB VRAM · ARM64
:8355
TensorRT-LLM · Nemotron-3-Nano-30B NVFP4
Dedicated: Nexus subagents
sub-011 coding
sub-012 research
sub-013 analysis
sub-014 coding
sub-015 review
sub-016 gen
sub-017 data
sub-018 scan
sub-019 migrate
sub-020 audit
TRT-LLM endpoint Ollama (vision, embed) load balanced
🟢
stark (10.1.2.151)
GB10 Blackwell · 128 GB VRAM · ARM64
:8355
TensorRT-LLM · Nemotron-3-Nano-30B NVFP4
Dedicated: Atlas + Catalyst subagents
🟢
bark (10.1.2.153)
GB10 Blackwell · 128 GB VRAM · ARM64
:8355
ORCHESTRATOR :8356
Claude-Distilled 27B MoE
vLLM · Qwen3.5-35B-A3B-GPTQ-Int4
Dedicated: CogStack KG operations
Local SDvoxel inference
Local inference
RDMA interconnect
Active subagent
Orchestrators decide WHAT → Spawn subagents on local GPUs → GPUs do the WORK