LMP AI Cluster — Benchmark Report 2026-04-03

Peak Cluster Metrics

Peak Cluster Throughput

2,668

tok/s · 3 nodes · 15-min sustained

Best Single Node

991

tok/s · spark · 100 concurrent agents

Total Concurrent Agents

300

100 per node · staggered launch

Error Rate

~0%

2/40 in one early round · all others clean

Cluster Architecture

⚡

spark

10.1.2.150 · :8355

TRT-LLM Nano-30B NVFP4

Koda subagents · PostgreSQL · Redis

974 tok/s

🌑

dark

10.1.2.155 · :8355

TRT-LLM Nano-30B NVFP4

Nexus subagents · Ollama bge-m3

894 tok/s

⭐

stark

10.1.2.151 · :8355

TRT-LLM Nano-30B NVFP4

Atlas + Catalyst subagents

~800 tok/s

🐶

bark

10.1.2.153 · :8355

vLLM Qwen3.5-35B-A3B-GPTQ

CogStack KG ops · dedicated

untouched

Router + CogStack API + OpenClaw Gateway all co-located on LXC 402 (10.1.2.70). Subagents routed by parent agent via X-Session-Key header.

Benchmark Results by Round

Round	Config	Node	Duration	Requests	Errors	Throughput
R1	20 agents	stark	60s	60	0	320 tok/s
R2	40 agents	spark	66s	38/40	2	293 tok/s
R2	40 agents	dark	60s	40/40	0	286 tok/s
R2	40 agents	stark	60s	40/40	0	322 tok/s
R3	80 agents	spark	141s	240	0	868 tok/s
R3	80 agents	dark	139s	560	0	803 tok/s
R3	80 agents	stark	48s	160	0	838 tok/s
R4	100 agents	spark	155s	300	0	991 tok/s
R4	100 agents	dark	120s	600	0	907 tok/s
R4	100 agents	stark	141s	400	0	711 tok/s
R5 ⭐	100 agents · 15 min	spark	900s	1,800	0	974 tok/s
R5 ⭐	100 agents · 15 min	dark	900s	4,100	0	894 tok/s
R5 ⭐	100 agents · 15 min	stark	900s	—	0	~800 tok/s

Performance Charts

Throughput Scaling by Concurrency

15-Minute Endurance — spark vs dark

Combined Cluster tok/s by Round

Key Findings

📈

Near-linear concurrency scaling. Each GB10 node scaled from ~300 tok/s at 40 agents to ~950 tok/s at 100 agents — a 3× throughput gain from 2.5× more concurrency. Higher concurrency fills TRT-LLM batch queues more efficiently.

🏋️

15-minute endurance validated. spark held 974 tok/s and dark held 894 tok/s for the full 15-minute run with zero errors and no thermal throttling. GB10 temperature peaked at 84°C — well within safe range.

🔀

Dedicated routing works. bark (Qwen3.5/KG) remained completely unaffected throughout all test rounds. CogStack KG operations and inference load are cleanly separated — no resource contention observed.

🌟

stark is a shared node. Atlas + Catalyst both route subagents to stark, which explains slightly lower per-test numbers vs spark/dark in concurrent runs. In isolation, stark performs identically. Consider moving Catalyst to its own node as workload grows.

⚠️

Prometheus not reporting on stark. nvidia-smi showed N/A for memory stats. GPU utilization read correctly (96%). Prometheus node exporter likely needs reinstallation on stark. Low priority — does not affect inference.

LMP AI Cluster — Stress Test Report

Peak Cluster Metrics

Cluster Architecture

Benchmark Results by Round

Performance Charts

Key Findings