mlx-lm

mlx-lm runs models natively on Apple MLX, providing efficient unified memory utilization.

Setup

brew install mlx-lm
mlx_lm.server --model mlx-community/gemma-2-9b-it-4bit

Property	Value
Default port	8080
API type	OpenAI-compatible
VRAM reporting	No
Model format	MLX (safetensors)
Detection	`/version` endpoint or `lsof` process detection

mlx-lm shares port 8080 with llama.cpp. asiai uses API probing and process detection to distinguish between them.
Models use the HuggingFace/MLX community format (e.g., mlx-community/gemma-2-9b-it-4bit).
Native MLX execution typically provides excellent performance on Apple Silicon.