oMLX
oMLX is a native macOS inference server that uses paged SSD KV caching to handle larger context windows than memory alone would allow, with continuous batching for concurrent requests on port 8000. It supports both OpenAI and Anthropic-compatible APIs on Apple Silicon.
oMLX is a native macOS LLM inference server with paged SSD KV caching and continuous batching, managed from the menu bar. Built on MLX for Apple Silicon.
Setup
brew tap jundot/omlx https://github.com/jundot/omlx
brew install omlx
Or download the .dmg from GitHub releases.
Details
| Property | Value |
|---|---|
| Default port | 8000 |
| API type | OpenAI-compatible + Anthropic-compatible |
| VRAM reporting | No |
| Model format | MLX (safetensors) |
| Detection | /admin/info JSON endpoint or /admin HTML page |
| Requirements | macOS 15+, Apple Silicon (M1+), 16 GB RAM min |
Notes
- oMLX shares port 8000 with vllm-mlx. asiai uses
/admin/infoprobing to distinguish between them. - SSD KV caching enables larger context windows with lower memory pressure.
- Continuous batching improves throughput under concurrent requests.
- Supports text LLMs, vision-language models, OCR models, embeddings, and rerankers.
- The admin dashboard at
/adminprovides real-time server metrics. - In-app auto-update when installed via
.dmg.
See also
Compare engines with asiai bench --engines omlx --- learn how