Saltar a contenido

mlx-lm

mlx-lm runs models natively on Apple MLX, providing efficient unified memory utilization.

Setup

brew install mlx-lm
mlx_lm.server --model mlx-community/gemma-2-9b-it-4bit

Details

Property Value
Default port 8080
API type OpenAI-compatible
VRAM reporting No
Model format MLX (safetensors)
Detection /version endpoint or lsof process detection

Notes

  • mlx-lm shares port 8080 with llama.cpp. asiai uses API probing and process detection to distinguish between them.
  • Models use the HuggingFace/MLX community format (e.g., mlx-community/gemma-2-9b-it-4bit).
  • Native MLX execution typically provides excellent performance on Apple Silicon.