OpenCut-AI now supports Google Gemma 4 locally, with TurboQuant KV-cache compression engine.

Hey Hunters 👋

We just shipped Google Gemma 4 support, paired with our TurboQuant KV-cache compression engine. That means you can now run Google's any-to-any multimodal models directly inside your editor — no API keys, no cloud, no data leaving your machine.

What's new in this drop:

Full Gemma 4 family wired into the hardware-aware model registry:
- Gemma 4 E2B (5B) — fits in ~3.5 GB, runs on 8 GB laptops
- Gemma 4 E4B (8B) — ~5.5 GB, the new sweet-spot for Pro tier
- Gemma 4 26B MoE (4B active) — big-model quality, efficient inference
- Gemma 4 31B Dense — top-tier quality for 24 GB+ GPUs

TurboQuant KV-cache compression on every model:
- 3.8× compression at 4-bit (cosine similarity 0.9986 — effectively lossless)
- 5.0× compression at 3-bit
- 7.3× compression at 2-bit for extreme memory savings
- Unlocks long-context editing sessions (32K–131K tokens) on consumer hardware

Hardware-aware auto-selection — OpenCutAI detects your RAM/VRAM and picks the largest Gemma model that'll actually run smoothly. No guesswork.

Served through both Ollama (for simple local use) and our TurboQuant service

Why this matters:
Local video AI has always been a RAM problem. An 8B multimodal model + a long edit timeline + Whisper + TTS used to blow past 16 GB easily. With TurboQuant compressing the KV cache, you can now run Gemma 4 E4B end-to-end on a MacBook with room to spare.

Try it, tear it apart, tell us what breaks 🙏

35 views

OpenCut-AI now supports Google Gemma 4 locally, with TurboQuant KV-cache compression engine.

Replies