Getting Started

Apple Silicon AI — Multi-engine LLM benchmark & monitoring CLI.

asiai compares inference engines side-by-side on your Mac. Load the same model on Ollama and LM Studio, run asiai bench, get the numbers. No guessing, no vibes — just tok/s, TTFT, power efficiency, and stability per engine.

Quick start

brew tap druide67/tap
brew install asiai

Or with pip:

pip install asiai

Then detect your engines:

asiai detect

And benchmark:

asiai bench -m qwen3.5 --runs 3 --power

What it measures

Metric	Description
tok/s	Generation speed (tokens/sec), excluding prompt processing
TTFT	Time to first token — prompt processing latency
Power	GPU power draw in watts (`sudo powermetrics`)
tok/s/W	Energy efficiency — tokens per second per watt
Stability	Run-to-run variance: stable (<5%), variable (<10%), unstable (>10%)
VRAM	GPU memory footprint (Ollama only)
Thermal	CPU throttling state and speed limit percentage

Supported engines

Engine	Port	API
Ollama	11434	Native
LM Studio	1234	OpenAI-compatible
mlx-lm	8080	OpenAI-compatible
llama.cpp	8080	OpenAI-compatible
vllm-mlx	8000	OpenAI-compatible

Requirements

macOS on Apple Silicon (M1 / M2 / M3 / M4)
Python 3.11+
At least one inference engine running locally

Zero dependencies

The core uses only the Python standard library — urllib, sqlite3, subprocess, argparse. No requests, no psutil, no rich.

Optional extras:

asiai[tui] — Textual terminal dashboard
asiai[dev] — pytest, ruff, pytest-cov