跳转至

Getting Started

Apple Silicon AI — Multi-engine LLM benchmark & monitoring CLI.

asiai compares inference engines side-by-side on your Mac. Load the same model on Ollama and LM Studio, run asiai bench, get the numbers. No guessing, no vibes — just tok/s, TTFT, power efficiency, and stability per engine.

Quick start

pipx install asiai        # Recommended: isolated install

Or via Homebrew:

brew tap druide67/tap
brew install asiai

Other options:

uvx asiai detect           # Run without installing (requires uv)
pip install asiai           # Standard pip install

First launch

asiai setup                # Interactive wizard — detects hardware, engines, models
asiai detect               # Or jump straight to engine detection

Then benchmark:

asiai bench -m qwen3.5 --runs 3 --power

Example output:

  Mac Mini M4 Pro — Apple M4 Pro  RAM: 64.0 GB (42% used)  Pressure: normal

Benchmark: qwen3.5

  Engine       tok/s (±stddev)    Tokens   Duration     TTFT       VRAM    Thermal
  ────────── ───────────────── ───────── ────────── ──────── ────────── ──────────
  lmstudio    72.6 ± 0.0 (stable)   435    6.20s    0.28s        —    nominal
  ollama      30.4 ± 0.1 (stable)   448   15.28s    0.25s   26.0 GB   nominal

  Winner: lmstudio (2.4x faster)
  Power: lmstudio 13.2W (5.52 tok/s/W) — ollama 16.0W (1.89 tok/s/W)

What it measures

Metric Description
tok/s Generation speed (tokens/sec), excluding prompt processing
TTFT Time to first token — prompt processing latency
Power GPU power draw in watts (sudo powermetrics)
tok/s/W Energy efficiency — tokens per second per watt
Stability Run-to-run variance: stable (<5%), variable (<10%), unstable (>10%)
VRAM Memory footprint — native (Ollama, LM Studio) or estimated via ri_phys_footprint (all engines)
Thermal CPU throttling state and speed limit percentage

Supported engines

Engine Port API
Ollama 11434 Native
LM Studio 1234 OpenAI-compatible
mlx-lm 8080 OpenAI-compatible
llama.cpp 8080 OpenAI-compatible
oMLX 8000 OpenAI-compatible
vllm-mlx 8000 OpenAI-compatible
Exo 52415 OpenAI-compatible

Custom ports

If your engine runs on a non-standard port, asiai will usually find it automatically via process detection. You can also register it manually:

asiai config add omlx http://localhost:8800 --label mac-mini

Manually added engines are persisted and never auto-pruned. See config for details.

Requirements

  • macOS on Apple Silicon (M1 / M2 / M3 / M4)
  • Python 3.11+
  • At least one inference engine running locally

Zero dependencies

The core uses only the Python standard library — urllib, sqlite3, subprocess, argparse. No requests, no psutil, no rich.

Optional extras:

  • asiai[web] — FastAPI web dashboard with charts
  • asiai[tui] — Textual terminal dashboard
  • asiai[mcp] — MCP server for AI agent integration
  • asiai[all] — Web + TUI + MCP
  • asiai[dev] — pytest, ruff, pytest-cov