Ollama v0.19 - Massive local model speedup on Apple Silicon with MLX
Ollama v0.19 rebuilds Apple Silicon inference on top of MLX, bringing much faster local performance for coding and agent workflows. It also adds NVFP4 support and smarter cache reuse, snapshots, and eviction for more responsive sessions.



Replies
Nice timing with the MLX optimization, the gap between cloud and on-device inference is getting smaller every month. Been running models locally on Apple hardware myself and the progress is wild. Curious how this handles the larger SDXL-class models on M-series chips?