Skip to content

Architecture

How data flows through asiai — from hardware sensors to your terminal, browser, and AI agents.

Overview

asiai architecture overview

Key files

Layer Files Role
Engines src/asiai/engines/ ABC InferenceEngine + 7 adapters (Ollama, LM Studio, mlx-lm, llama.cpp, oMLX, vllm-mlx, Exo). OpenAICompatEngine base class for OpenAI-compatible engines.
Collectors src/asiai/collectors/ System metrics: gpu.py (ioreg), system.py (CPU, memory, thermal), power.py + ioreport.py (GPU/CPU/ANE watts via IOReport), inference.py (TCP connections, Prometheus scrape), snapshot.py (full system snapshot).
Benchmark src/asiai/benchmark/ runner.py (warmup + N runs, median, stddev, CI95), prompts.py (test prompts), card.py (SVG card generation).
Storage src/asiai/storage/ db.py (SQLite WAL, all CRUD), schema.py (tables + migrations).
CLI src/asiai/cli.py Argparse entry point, all 12 commands.
Web src/asiai/web/ FastAPI + htmx + SSE + ApexCharts dashboard. Routes in routes/.
MCP src/asiai/mcp/ FastMCP server, 11 tools + 3 resources. Transports: stdio, SSE, streamable-http.
Advisor src/asiai/advisor/ Hardware-aware recommendations (model sizing, engine selection).
Display src/asiai/display/ ANSI formatters (formatters.py), CLI renderer (cli_renderer.py), TUI (tui.py).

Data flow

Monitoring (daemon mode)

Every 60s:
  collectors → snapshot dict → store_snapshot(db) → models table
                                                  → metrics table
  engines    → engine status → store_engine_status(db)

Benchmark

CLI --bench → detect engines → pick model → warmup → N runs
           → compute median/stddev/CI95 → store_benchmark(db)
           → render table (ANSI or JSON)
           → optional: --share → POST to community API
           → optional: --card  → generate SVG card

Web dashboard

Browser → FastAPI → Jinja2 template (initial render)
       → htmx SSE → /api/v1/stream → real-time updates
       → ApexCharts → /api/v1/metrics?hours=N → historical graphs

MCP server

AI agent → stdio/SSE/HTTP → FastMCP → tool call
        → runs collector/benchmark in thread pool (asyncio.to_thread)
        → returns structured JSON

Design principles

  1. Zero dependencies for core — CLI, collectors, engines, storage use only stdlib Python. Optional extras ([web], [tui], [mcp]) add dependencies only when needed.
  2. Shared Data Layer — The same SQLite database serves CLI, web, MCP, and Prometheus. No separate data stores.
  3. Adapter pattern — All 7 engines implement InferenceEngine ABC. Adding a new engine = 1 file + register in detect.py.
  4. Lazy imports — Each CLI command imports its dependencies locally, keeping startup time fast.
  5. macOS-nativeioreg for GPU, launchd for daemons, lsof for inference activity. No Linux abstractions.