← LMP Stack
🏭

IOL Agent

Predictive Maintenance Β· Agentic AI Β· On-Premise

DESIGN COMPLETE GitLab β†—

Overview

An on-premise Agentic AI system for predictive maintenance in SME manufacturing. Runs on LMP's NVIDIA Blackwell GPU cluster. The agent monitors industrial equipment via telemetry sensors, detects anomalies before failures occur, checks spare parts inventory, places orders, and schedules maintenance β€” all autonomously, with Human-in-the-Loop (HITL) financial controls for high-value decisions.

πŸ“„
13 Design Docs
Full architecture on GitLab
🧠
AI Agent
Nemotron-3 on Blackwell
🌑️
PT100 Telemetry
5s interval, MQTT β†’ TimescaleDB
βš™οΈ
3 Decision Paths
Auto / Auto-order / HITL

Architecture

PT100 Sensor β†’ RPi (MAX31865 ADC) β†’ MQTT Broker β†’ Ingestion Service β†’ TimescaleDB
                                         β”‚                                    ↓
                                    Eclipse Mosquitto              AI Agent (LLM on Blackwell)
                                     (pve3:1883)                          ↓
                                                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                              β”‚                     β”‚
                                                         Part in stock         Part NOT in stock
                                                              β”‚                     β”‚
                                                       Generate work order    Query supplier API
                                                       Assign to technician        β”‚
                                                                            β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
                                                                         Under limit   Over limit
                                                                            β”‚              β”‚
                                                                       Auto-order     HITL approval

Hardware Platform

Control Server (pve3)

CPURyzen 9 9950X3D
RAM96GB DDR5
GPURTX 5080 16GB
RoleOrchestration, Docker, MQTT

AI Cluster (2Γ— GB10 Blackwell)

Nodesspark + dark
VRAM128GB each (256GB total)
Interconnect2Γ—200GbE RDMA
ModelNemotron-3-Nano-30B (TRT-LLM)

Telemetry β€” Part #1 (PoC)

ParameterValueNotes
SensorPT100 (3-wire RTD)via MAX31865 ADC on RPi
MeasurementTemperature (Β°C)Range: -200 to +850Β°C
Interval5 seconds17,280 readings/day
Normal range35–40Β°COperational band
Warning>41Β°CInstant, single reading
Alarm>45Β°C Γ— 10 readings50 sec sustained β†’ failure
TransportMQTT (QoS 1)Topic: iol/telemetry/part/1/temperature
DatabaseTimescaleDB (PG16)~240 MB/year for 1 sensor

Database Schema (TimescaleDB)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  parts   β”‚1────*β”‚ sensors  β”‚1────*β”‚ sensor_thresholds  β”‚
β”‚          β”‚       β”‚          β”‚       β”‚                    β”‚
β”‚ part_id  β”‚       β”‚ sensor_idβ”‚       β”‚ range_min/max      β”‚
β”‚ part_no  β”‚       β”‚ part_id  β”‚       β”‚ warning_value      β”‚
β”‚ name     β”‚       β”‚ type     β”‚       β”‚ alarm_value        β”‚
β”‚ location β”‚       β”‚ unit     β”‚       β”‚ alarm_count        β”‚
β”‚ status   β”‚       β”‚ interval β”‚       β”‚                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
                   β”‚telemetry β”‚
                   β”‚(hypertbl)β”‚       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚          β”‚       β”‚  alerts  β”‚
                   β”‚ time     β”‚       β”‚          β”‚
                   β”‚ sensor_idβ”‚       β”‚ alert_id β”‚
                   β”‚ value    β”‚       β”‚ sensor_idβ”‚
                   β”‚ quality  β”‚       β”‚ type     β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚ status   β”‚
                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5 tables Β· telemetry is a TimescaleDB hypertable with automatic time-based partitioning. Full SQL: db-design.md on GitLab

Data Ingestion Pipeline

🌑️

PT100 + RPi

MAX31865 ADC β†’ Python β†’ JSON payload every 5s

πŸ“‘

MQTT Broker

Eclipse Mosquitto 2 on pve3:1883

βš™οΈ

Ingestion Service

Python subscriber, batch INSERT (50 rows or 5s)

πŸ—„οΈ

TimescaleDB

PG16 hypertable, 90-day raw + 1-min aggregates

Docker Stack

services:
  iol-db:          timescale/timescaledb:latest-pg16    # :5432
  mqtt-broker:     eclipse-mosquitto:2                  # :1883, :9001
  iol-ingest:      custom Python image                  # subscribes MQTT β†’ writes DB

Full compose + Dockerfile: telemetry-ingestion.md on GitLab

Inventory Decision Logic

After confirming a failure prediction, the agent checks spare parts inventory and follows one of three paths.

Decision A

Part In Stock

Reserve part β†’ generate work order β†’ assign technician β†’ schedule before failure. Fully autonomous.

Decision B1

Out of Stock, Under €500

Query supplier API β†’ auto-order β†’ create PO β†’ schedule work after delivery. Autonomous within limit.

Decision B2

Out of Stock, Over €500

Query supplier β†’ create pending PO β†’ HITL approval request β†’ human approves/rejects. Human-in-the-loop.

Full logic + SQL: inventory-decision-logic.md on GitLab

Maintenance Log System

Customer-supplied maintenance books are ingested into both a structured SQL database (lifecycle tracking, overdue alerts) and a vector embedding store (semantic search by the LLM agent). The agent combines both for failure prediction.

πŸ“‹

Structured Queries

SQL: lifecycle tracking, overdue parts, maintenance history

πŸ”

Semantic Search

pgvector + nomic-embed-text: find historical correlations

🧠

LLM Reasoning

Nemotron on Blackwell: predict time-to-failure from combined data

Sample Maintenance Logs (Training Data)

DocumentMachineAlarm β†’ ReplacePatternCost
ML-001 Line A 7 days Gradual wear, peak 47.2Β°C €460
ML-002 Line B 3 days Contamination, peak 48.6Β°C, 73h downtime €610
ML-003 Line C 8 days End-of-lifecycle, peak 46.8Β°C €430

Project Objectives

1. Validate on Blackwell

Prove local LLMs can do real-time telemetry + semantic log analysis without cloud.

2. Closed-Loop Integration

Agent writes to ERP/SQL, generates work orders, calls supplier APIs autonomously.

3. HITL Efficiency

90% automation, humans approve only high-value financial decisions.

Python Libraries

LibraryLayerPurposeDocs
adafruit_max31865 RPi PT100 β†’ temperature via SPI API Β· Guide
paho.mqtt.client Both MQTT publish/subscribe (QoS 1) API Β· GitHub
psycopg2 Ingestion PostgreSQL driver + batch INSERT Docs
Mosquitto Broker MQTT message broker Docs Β· Docker

GitLab Repository Documents

DocumentDescriptionLink
project-analysis.md Full project analysis & definition GitLab β†—
db-design.md TimescaleDB schema, detection queries, retention GitLab β†—
db-design-maintenance.md Maintenance log DB β€” structured + pgvector + seed data GitLab β†—
telemetry-ingestion.md PT100 β†’ MQTT β†’ DB pipeline, code, Docker GitLab β†—
python-libraries.md Python library reference with official docs links GitLab β†—
maintenance-log-design.md Maintenance log schema, semantic search, ingestion pipeline GitLab β†—
sample-data/ (3 files) Sample maintenance logs: ML-001 (7d), ML-002 (3d), ML-003 (8d) GitLab β†—
spare_parts_stock.sql Spare parts inventory DB β€” part_no, qty, price + seed data GitLab β†—
inventory-decision-logic.md Decision A, B1, B2 β€” full SQL + HITL flow + new tables GitLab β†—
supplier-api-design.md Supplier gateway API β€” search, quote, order, tracking GitLab β†—
agent-orchestration.md LLM orchestration β€” tool-calling engine, event loop, audit trail GitLab β†—

🎫 EventAlpha

LIVE β€” PAPER TRADING

Ticket market intelligence & arbitrage platform (formerly Patko Project). Monitoring Ticketmaster US+EU, concerts + sports. Virtual €5,000 paper trading budget active.

View EventAlpha page β†’

πŸ“¦ Sprint Summary β€” 2026-03-23/24

DESIGN PHASE COMPLETE
Delivered
  • βœ“Project analysis & definition (project-analysis.md)
  • βœ“Telemetry DB schema designed (db-design.md)
  • βœ“TimescaleDB deployed: qa-docker:5434, DB iol_telemetry, seeded with PART-001
  • βœ“PT100 β†’ MQTT β†’ TimescaleDB ingestion pipeline designed (telemetry-ingestion.md)
  • βœ“Maintenance log schema (4 tables) + 3 sample logs (7-day, 3-day, 8-day failure patterns)
  • βœ“Spare parts stock schema: part_no, qty, price β€” 5 sample parts seeded
  • βœ“Supplier API gateway spec (5 endpoints, ranked results, HITL gate)
  • βœ“Agent orchestration layer designed (8 tools, ReAct loop, 128K context)
  • βœ“Python library reference doc with official links
  • βœ“Full project handover: IOL-Agent-Project-Handover.md (190KB) + PDF (397KB)
  • βœ“All docs pushed to GitLab + this portal page updated
Infrastructure State
TimescaleDBqa-docker:5434 βœ… running
VersionTimescaleDB 2.25.2 / PG16
DBiol_telemetry
User/Passiol / iol_dev_2026
SeededPART-001 (Line A), temp sensor
Legacy iol-agentpreserved (plain PG)
LLM endpoint:18357 (Nano-30B)
GitLab Documents
project-analysis.md
db-design.md Β· db-design-maintenance.md
telemetry-ingestion.md
maintenance-log-design.md
sample-data/maintenance-log-001/002/003.md
supplier-api-design.md
agent-orchestration.md
spare_parts_stock.sql
IOL-Agent-Project-Handover.md (190KB)
IOL-Agent-Project-Handover.pdf (397KB)
Next Phase β€” Implementation
1. Ingestion Service
Build + containerize Python MQTT β†’ TimescaleDB service (paho-mqtt + psycopg2)
2. LLM Orchestrator
Code ReAct loop, wire 8 tools to Nemotron-Nano via :18357 proxy
3. HITL Workflow
Webhook-based approval gate for POs + emergency escalations