Ollama

The easiest way to run large language models locally

5.0•26 reviews•

1.4K followers

The easiest way to run large language models locally

5.0•26 reviews•

1.4K followers

Visit website

AI Infrastructure Tools

•

LLM Developer Tools

Run Llama 2 and other models on macOS, with Windows and Linux coming soon. Customize and create your own.

This is the 4th launch from Ollama. View more

Ollama v0.19

Launching today

Massive local model speedup on Apple Silicon with MLX

Ollama v0.19 rebuilds Apple Silicon inference on top of MLX, bringing much faster local performance for coding and agent workflows. It also adds NVFP4 support and smarter cache reuse, snapshots, and eviction for more responsive sessions.

Free

Launch tags:Open Source•Artificial Intelligence•Apple

Launch Team

Intercom — Startups get 90% off Intercom + 1 year of Fin AI Agent free

Startups get 90% off Intercom + 1 year of Fin AI Agent free

Promoted

Flowtica Scribe

Hunter

📌

Hi everyone!

The engineering in Ollama v0.19 is a massive leap for anyone running local models on macOS. Moving to Apple's native MLX framework changes the game for performance, leveraging the unified memory architecture and the new GPU Neural Accelerators on the M5 chips.

v0.19 now also supports NVFP4, which brings local inference closer to production parity, and the KV cache has been reworked with cache reuse across conversations, intelligent checkpoints, and smarter eviction. For branching agent workflows like @Claude Code or @OpenClaw , that should mean lower memory use and faster responses.

If you have a Mac with 32GB+ of unified memory, you can pull the new Qwen3.5-35B-A3B NVFP4 model and test this right now. Running heavy agentic workflows locally just became a lot more viable!

Report

12h ago

Been running Ollama since like v0.12 and the speed improvements keep blowing my mind. The MLX integration is huge for M-series Macs tbh.

Smarter cache reuse is the underrated feature here. I run a coding assistant locally and switching between projects used to basically cold start every time. If the KV cache actually persists across sessions that changes everything for agent workflows.

Report

6h ago

Finally, MLX-native inference. I've been running local models on my M2 Air for quick prototyping when I don't want to burn API credits, and the speed difference on Apple Silicon matters a lot when you're going back and forth between coding and testing. Curious how it handles the bigger models now, like 70B+ quantized. Does the memory management play nicer with other heavy processes running?

Report

1h ago

Well done! Do all the current models work automatically with MLX with this version on macOS, or do you need to download a specific version of each model?

Report

3h ago

This is huge for local-first AI workflows. Curious how much real-world speedup people are seeing on M-series chips

Report

2h ago

Previous Ollama Launches

Ollama Desktop AppThe easiest way to chat with local AI

Launched on August 1st, 2025

Ollama multimodal engineRun leading vision models locally with the new engine

Launched on May 19th, 2025

OllamaThe easiest way to run large language models locally

Launched on August 22nd, 2023

Ollama

The easiest way to run large language models locally

The easiest way to run large language models locally

Ollama v0.19

Previous Ollama Launches

Previous Ollama Launches

What's great

What needs improvement

vs Alternatives

Alternatives Considered

What's great

What's great

What needs improvement

vs Alternatives

Alternatives Considered

What's great