IOL Agent — Predictive Maintenance AI

Overview

An on-premise Agentic AI system for predictive maintenance in SME manufacturing. Runs on LMP's NVIDIA Blackwell GPU cluster. The agent monitors industrial equipment via telemetry sensors, detects anomalies before failures occur, checks spare parts inventory, places orders, and schedules maintenance — all autonomously, with Human-in-the-Loop (HITL) financial controls for high-value decisions.

📄

13 Design Docs

Full architecture on GitLab

🧠

AI Agent

Nemotron-3 on Blackwell

🌡️

PT100 Telemetry

5s interval, MQTT → TimescaleDB

⚙️

3 Decision Paths

Auto / Auto-order / HITL

Architecture

PT100 Sensor → RPi (MAX31865 ADC) → MQTT Broker → Ingestion Service → TimescaleDB
                                         │                                    ↓
                                    Eclipse Mosquitto              AI Agent (LLM on Blackwell)
                                     (pve3:1883)                          ↓
                                                              ┌──────────┴──────────┐
                                                              │                     │
                                                         Part in stock         Part NOT in stock
                                                              │                     │
                                                       Generate work order    Query supplier API
                                                       Assign to technician        │
                                                                            ┌──────┴──────┐
                                                                         Under limit   Over limit
                                                                            │              │
                                                                       Auto-order     HITL approval

Hardware Platform

Control Server (pve3)

CPU	Ryzen 9 9950X3D
RAM	96GB DDR5
GPU	RTX 5080 16GB
Role	Orchestration, Docker, MQTT

AI Cluster (2× GB10 Blackwell)

Nodes	spark + dark
VRAM	128GB each (256GB total)
Interconnect	2×200GbE RDMA
Model	Nemotron-3-Nano-30B (TRT-LLM)

Telemetry — Part #1 (PoC)

Parameter	Value	Notes
Sensor	PT100 (3-wire RTD)	via MAX31865 ADC on RPi
Measurement	Temperature (°C)	Range: -200 to +850°C
Interval	5 seconds	17,280 readings/day
Normal range	35–40°C	Operational band
Warning	>41°C	Instant, single reading
Alarm	>45°C × 10 readings	50 sec sustained → failure
Transport	MQTT (QoS 1)	Topic: iol/telemetry/part/1/temperature
Database	TimescaleDB (PG16)	~240 MB/year for 1 sensor

Database Schema (TimescaleDB)

┌──────────┐       ┌──────────┐       ┌────────────────────┐
│  parts   │1────*│ sensors  │1────*│ sensor_thresholds  │
│          │       │          │       │                    │
│ part_id  │       │ sensor_id│       │ range_min/max      │
│ part_no  │       │ part_id  │       │ warning_value      │
│ name     │       │ type     │       │ alarm_value        │
│ location │       │ unit     │       │ alarm_count        │
│ status   │       │ interval │       │                    │
└──────────┘       └────┬─────┘       └────────────────────┘
                        │
                   ┌────┴─────┐
                   │telemetry │
                   │(hypertbl)│       ┌──────────┐
                   │          │       │  alerts  │
                   │ time     │       │          │
                   │ sensor_id│       │ alert_id │
                   │ value    │       │ sensor_id│
                   │ quality  │       │ type     │
                   └──────────┘       │ status   │
                                      └──────────┘

5 tables · telemetry is a TimescaleDB hypertable with automatic time-based partitioning. Full SQL: db-design.md on GitLab

Data Ingestion Pipeline

🌡️

PT100 + RPi

MAX31865 ADC → Python → JSON payload every 5s

📡

MQTT Broker

Eclipse Mosquitto 2 on pve3:1883

⚙️

Ingestion Service

Python subscriber, batch INSERT (50 rows or 5s)

🗄️

TimescaleDB

PG16 hypertable, 90-day raw + 1-min aggregates

Docker Stack

services:
  iol-db:          timescale/timescaledb:latest-pg16    # :5432
  mqtt-broker:     eclipse-mosquitto:2                  # :1883, :9001
  iol-ingest:      custom Python image                  # subscribes MQTT → writes DB

Full compose + Dockerfile: telemetry-ingestion.md on GitLab

Inventory Decision Logic

After confirming a failure prediction, the agent checks spare parts inventory and follows one of three paths.

Decision A

Part In Stock

Reserve part → generate work order → assign technician → schedule before failure. Fully autonomous.

Decision B1

Out of Stock, Under €500

Query supplier API → auto-order → create PO → schedule work after delivery. Autonomous within limit.

Decision B2

Out of Stock, Over €500

Query supplier → create pending PO → HITL approval request → human approves/rejects. Human-in-the-loop.

Full logic + SQL: inventory-decision-logic.md on GitLab

Maintenance Log System

Customer-supplied maintenance books are ingested into both a structured SQL database (lifecycle tracking, overdue alerts) and a vector embedding store (semantic search by the LLM agent). The agent combines both for failure prediction.

📋

Structured Queries

SQL: lifecycle tracking, overdue parts, maintenance history

🔍

Semantic Search

pgvector + nomic-embed-text: find historical correlations

🧠

LLM Reasoning

Nemotron on Blackwell: predict time-to-failure from combined data

Sample Maintenance Logs (Training Data)

Document	Machine	Alarm → Replace	Pattern	Cost
ML-001	Line A	7 days	Gradual wear, peak 47.2°C	€460
ML-002	Line B	3 days	Contamination, peak 48.6°C, 73h downtime	€610
ML-003	Line C	8 days	End-of-lifecycle, peak 46.8°C	€430

Project Objectives

1. Validate on Blackwell

Prove local LLMs can do real-time telemetry + semantic log analysis without cloud.

2. Closed-Loop Integration

Agent writes to ERP/SQL, generates work orders, calls supplier APIs autonomously.

3. HITL Efficiency

90% automation, humans approve only high-value financial decisions.

Python Libraries

Library	Layer	Purpose	Docs
adafruit_max31865	RPi	PT100 → temperature via SPI	API · Guide
paho.mqtt.client	Both	MQTT publish/subscribe (QoS 1)	API · GitHub
psycopg2	Ingestion	PostgreSQL driver + batch INSERT	Docs
Mosquitto	Broker	MQTT message broker	Docs · Docker

GitLab Repository Documents

Document	Description	Link
project-analysis.md	Full project analysis & definition	GitLab ↗
db-design.md	TimescaleDB schema, detection queries, retention	GitLab ↗
db-design-maintenance.md	Maintenance log DB — structured + pgvector + seed data	GitLab ↗
telemetry-ingestion.md	PT100 → MQTT → DB pipeline, code, Docker	GitLab ↗
python-libraries.md	Python library reference with official docs links	GitLab ↗
maintenance-log-design.md	Maintenance log schema, semantic search, ingestion pipeline	GitLab ↗
sample-data/ (3 files)	Sample maintenance logs: ML-001 (7d), ML-002 (3d), ML-003 (8d)	GitLab ↗
spare_parts_stock.sql	Spare parts inventory DB — part_no, qty, price + seed data	GitLab ↗
inventory-decision-logic.md	Decision A, B1, B2 — full SQL + HITL flow + new tables	GitLab ↗
supplier-api-design.md	Supplier gateway API — search, quote, order, tracking	GitLab ↗
agent-orchestration.md	LLM orchestration — tool-calling engine, event loop, audit trail	GitLab ↗

🎫 EventAlpha

LIVE — PAPER TRADING

Ticket market intelligence & arbitrage platform (formerly Patko Project). Monitoring Ticketmaster US+EU, concerts + sports. Virtual €5,000 paper trading budget active.

View EventAlpha page →

📦 Sprint Summary — 2026-03-23/24

DESIGN PHASE COMPLETE

Delivered

✓Project analysis & definition (project-analysis.md)
✓Telemetry DB schema designed (db-design.md)
✓TimescaleDB deployed: qa-docker:5434, DB iol_telemetry, seeded with PART-001
✓PT100 → MQTT → TimescaleDB ingestion pipeline designed (telemetry-ingestion.md)
✓Maintenance log schema (4 tables) + 3 sample logs (7-day, 3-day, 8-day failure patterns)
✓Spare parts stock schema: part_no, qty, price — 5 sample parts seeded
✓Supplier API gateway spec (5 endpoints, ranked results, HITL gate)
✓Agent orchestration layer designed (8 tools, ReAct loop, 128K context)
✓Python library reference doc with official links
✓Full project handover: IOL-Agent-Project-Handover.md (190KB) + PDF (397KB)
✓All docs pushed to GitLab + this portal page updated

Infrastructure State

TimescaleDBqa-docker:5434 ✅ running

VersionTimescaleDB 2.25.2 / PG16

DBiol_telemetry

User/Passiol / iol_dev_2026

SeededPART-001 (Line A), temp sensor

Legacy iol-agentpreserved (plain PG)

LLM endpoint:18357 (Nano-30B)

GitLab Documents

project-analysis.md

db-design.md · db-design-maintenance.md

telemetry-ingestion.md

maintenance-log-design.md

sample-data/maintenance-log-001/002/003.md

supplier-api-design.md

agent-orchestration.md

spare_parts_stock.sql

IOL-Agent-Project-Handover.md (190KB)

IOL-Agent-Project-Handover.pdf (397KB)

Next Phase — Implementation

1. Ingestion Service

Build + containerize Python MQTT → TimescaleDB service (paho-mqtt + psycopg2)

2. LLM Orchestrator

Code ReAct loop, wire 8 tools to Nemotron-Nano via :18357 proxy

3. HITL Workflow

Webhook-based approval gate for POs + emergency escalations