Getting Started
Apple Silicon AI — Multi-engine LLM benchmark & monitoring CLI.
asiai compares inference engines side-by-side on your Mac. Load the same model on Ollama and LM Studio, run asiai bench, get the numbers. No guessing, no vibes — just tok/s, TTFT, power efficiency, and stability per engine.
Quick start
brew tap druide67/tap
brew install asiai
Or with pip:
pip install asiai
Then detect your engines:
asiai detect
And benchmark:
asiai bench -m qwen3.5 --runs 3 --power
What it measures
| Metric | Description |
|---|---|
| tok/s | Generation speed (tokens/sec), excluding prompt processing |
| TTFT | Time to first token — prompt processing latency |
| Power | GPU power draw in watts (sudo powermetrics) |
| tok/s/W | Energy efficiency — tokens per second per watt |
| Stability | Run-to-run variance: stable (<5%), variable (<10%), unstable (>10%) |
| VRAM | GPU memory footprint (Ollama only) |
| Thermal | CPU throttling state and speed limit percentage |
Supported engines
| Engine | Port | API |
|---|---|---|
| Ollama | 11434 | Native |
| LM Studio | 1234 | OpenAI-compatible |
| mlx-lm | 8080 | OpenAI-compatible |
| llama.cpp | 8080 | OpenAI-compatible |
| vllm-mlx | 8000 | OpenAI-compatible |
Requirements
- macOS on Apple Silicon (M1 / M2 / M3 / M4)
- Python 3.11+
- At least one inference engine running locally
Zero dependencies
The core uses only the Python standard library — urllib, sqlite3, subprocess, argparse. No requests, no psutil, no rich.
Optional extras:
asiai[tui]— Textual terminal dashboardasiai[dev]— pytest, ruff, pytest-cov