Getting Started

Apple Silicon AI — Multi-engine LLM benchmark & monitoring CLI.

asiai compares inference engines side-by-side on your Mac. Load the same model on Ollama and LM Studio, run asiai bench, get the numbers. No guessing, no vibes — just tok/s, TTFT, power efficiency, and stability per engine.

Quick start

pipx install asiai        # Recommended: isolated install

Or via Homebrew:

brew tap druide67/tap
brew install asiai

Other options:

uvx asiai detect           # Run without installing (requires uv)
pip install asiai           # Standard pip install

First launch

asiai setup                # Interactive wizard — detects hardware, engines, models
asiai detect               # Or jump straight to engine detection

Then benchmark:

asiai bench -m qwen3.5 --runs 3 --power

Example output:

  Mac Mini M4 Pro — Apple M4 Pro  RAM: 64.0 GB (42% used)  Pressure: normal

Benchmark: qwen3.5

  Engine       tok/s (±stddev)    Tokens   Duration     TTFT       VRAM    Thermal
  ────────── ───────────────── ───────── ────────── ──────── ────────── ──────────
  lmstudio    72.6 ± 0.0 (stable)   435    6.20s    0.28s        —    nominal
  ollama      30.4 ± 0.1 (stable)   448   15.28s    0.25s   26.0 GB   nominal

  Winner: lmstudio (2.4x faster)
  Power: lmstudio 13.2W (5.52 tok/s/W) — ollama 16.0W (1.89 tok/s/W)

What it measures

Metric	Description
tok/s	Generation speed (tokens/sec), excluding prompt processing
TTFT	Time to first token — prompt processing latency
Power	GPU power draw in watts (`sudo powermetrics`)
tok/s/W	Energy efficiency — tokens per second per watt
Stability	Run-to-run variance: stable (<5%), variable (<10%), unstable (>10%)
VRAM	Memory footprint — native (Ollama, LM Studio) or estimated via `ri_phys_footprint` (all engines)
Thermal	CPU throttling state and speed limit percentage

Supported engines

Engine	Port	API
Ollama	11434	Native
LM Studio	1234	OpenAI-compatible
mlx-lm	8080	OpenAI-compatible
llama.cpp	8080	OpenAI-compatible
oMLX	8000	OpenAI-compatible
vllm-mlx	8000	OpenAI-compatible
Exo	52415	OpenAI-compatible

Custom ports

If your engine runs on a non-standard port, asiai will usually find it automatically via process detection. You can also register it manually:

asiai config add omlx http://localhost:8800 --label mac-mini

Manually added engines are persisted and never auto-pruned. See config for details.

Requirements

macOS on Apple Silicon (M1 / M2 / M3 / M4)
Python 3.11+
At least one inference engine running locally

Zero dependencies

The core uses only the Python standard library — urllib, sqlite3, subprocess, argparse. No requests, no psutil, no rich.

Optional extras:

asiai[web] — FastAPI web dashboard with charts
asiai[tui] — Textual terminal dashboard
asiai[mcp] — MCP server for AI agent integration
asiai[all] — Web + TUI + MCP
asiai[dev] — pytest, ruff, pytest-cov