Saltar a contenido

vMLX

vMLX is a high-performance MLX-based inference server with first-class support for Mamba/SSM hybrid architectures (DeltaNet, Mamba2, RetNet). It exposes an OpenAI-compatible API on port 8000 and identifies itself through a /version endpoint, with Prometheus metrics for inference activity.

vMLX targets Apple Silicon and is the only adapter here aimed at state-space / hybrid models alongside standard transformers.

Setup

pip install vmlx
vmlx serve --model <repo-id-or-path> --port 8000

Details

Property Value
Default port 8000
API type OpenAI-compatible
VRAM reporting No (asiai measures GPU power/VRAM via ioreg)
Model format MLX
Detection /version endpoint reporting vmlx, or owned_by: "vmlx" in /v1/models
Activity metrics /metrics (Prometheus)
Requirements Apple Silicon (M1+), macOS

Notes

  • vMLX shares port 8000 with oMLX and vllm-mlx. asiai disambiguates them by probing /version and the owned_by field of /v1/models.
  • First-class Mamba/SSM hybrid support (DeltaNet, Mamba2, RetNet) — useful for benchmarking non-transformer architectures that other MLX servers do not load.
  • Version resolves from /version, falling back to pip show vmlx.

See also

Compare engines with asiai bench --engines vmlx --- learn how