Rapid-MLX
Rapid-MLX is a Homebrew packaging of the vllm-mlx engine (raullenchai upstream). The rapid-mlx wrapper delegates to the embedded vllm-mlx Python module in the formula's libexec virtualenv, so you get vllm-mlx's serving behaviour with a one-line brew install and no manual venv management.
Setup
brew install raullenchai/rapid-mlx/rapid-mlx
# or, without Homebrew:
pip install rapid-mlx
Details
| Property | Value |
|---|---|
| Default port | 8000 |
| API type | OpenAI-compatible |
| VRAM reporting | No (asiai measures GPU power/VRAM via ioreg) |
| Model format | MLX |
| Detection | owned_by: "rapid-mlx" in /v1/models, version via rapid-mlx --version |
| Requirements | Apple Silicon (M1+), macOS, Homebrew (for the brew install path) |
Notes
- Rapid-MLX shares port 8000 with oMLX, vllm-mlx and vMLX. asiai disambiguates them by the
owned_byfield of/v1/models(rapid-mlx) and falls back to common brew install paths to resolve the binary. - Because it wraps vllm-mlx, serving semantics (continuous batching, OpenAI-compatible endpoints) match vllm-mlx; the difference is purely packaging and update path (Homebrew vs pip/manual venv).
aisctl upgrade rapidmlxis whitelisted (brew formularaullenchai/rapid-mlx/rapid-mlx) when asiai-inference-server is installed alongside.
See also
Compare engines with asiai bench --engines rapidmlx --- learn how