Getting Started
Apple Silicon AI — Multi-engine LLM benchmark & monitoring CLI.
asiai compares inference engines side-by-side on your Mac. Load the same model on Ollama and LM Studio, run asiai bench, get the numbers. No guessing, no vibes — just tok/s, TTFT, power efficiency, and stability per engine.
Quick start
pipx install asiai # Recommended: isolated install
Or via Homebrew:
brew tap druide67/tap
brew install asiai
Other options:
uvx asiai detect # Run without installing (requires uv)
pip install asiai # Standard pip install
First launch
asiai setup # Interactive wizard — detects hardware, engines, models
asiai detect # Or jump straight to engine detection
Then benchmark:
asiai bench -m qwen3.5 --runs 3 --power
Example output:
Mac Mini M4 Pro — Apple M4 Pro RAM: 64.0 GB (42% used) Pressure: normal
Benchmark: qwen3.5
Engine tok/s (±stddev) Tokens Duration TTFT VRAM Thermal
────────── ───────────────── ───────── ────────── ──────── ────────── ──────────
lmstudio 72.6 ± 0.0 (stable) 435 6.20s 0.28s — nominal
ollama 30.4 ± 0.1 (stable) 448 15.28s 0.25s 26.0 GB nominal
Winner: lmstudio (2.4x faster)
Power: lmstudio 13.2W (5.52 tok/s/W) — ollama 16.0W (1.89 tok/s/W)
What it measures
| Metric | Description |
|---|---|
| tok/s | Generation speed (tokens/sec), excluding prompt processing |
| TTFT | Time to first token — prompt processing latency |
| Power | GPU power draw in watts (sudo powermetrics) |
| tok/s/W | Energy efficiency — tokens per second per watt |
| Stability | Run-to-run variance: stable (<5%), variable (<10%), unstable (>10%) |
| VRAM | Memory footprint — native (Ollama, LM Studio) or estimated via ri_phys_footprint (all engines) |
| Thermal | CPU throttling state and speed limit percentage |
Supported engines
| Engine | Port | API |
|---|---|---|
| Ollama | 11434 | Native |
| LM Studio | 1234 | OpenAI-compatible |
| mlx-lm | 8080 | OpenAI-compatible |
| llama.cpp | 8080 | OpenAI-compatible |
| oMLX | 8000 | OpenAI-compatible |
| vllm-mlx | 8000 | OpenAI-compatible |
| Exo | 52415 | OpenAI-compatible |
Custom ports
If your engine runs on a non-standard port, asiai will usually find it automatically via process detection. You can also register it manually:
asiai config add omlx http://localhost:8800 --label mac-mini
Manually added engines are persisted and never auto-pruned. See config for details.
Requirements
- macOS on Apple Silicon (M1 / M2 / M3 / M4)
- Python 3.11+
- At least one inference engine running locally
Zero dependencies
The core uses only the Python standard library — urllib, sqlite3, subprocess, argparse. No requests, no psutil, no rich.
Optional extras:
asiai[web]— FastAPI web dashboard with chartsasiai[tui]— Textual terminal dashboardasiai[mcp]— MCP server for AI agent integrationasiai[all]— Web + TUI + MCPasiai[dev]— pytest, ruff, pytest-cov