OpenCut-AI now supports Google Gemma 4 locally, with TurboQuant KV-cache compression engine.
Hey Hunters π
We just shipped Google Gemma 4 support, paired with our TurboQuant KV-cache compression engine. That means you can now run Google's any-to-any multimodal models directly inside your editor β no API keys, no cloud, no data leaving your machine.
What's new in this drop:
Full Gemma 4 family wired into the hardware-aware model registry:
- Gemma 4 E2B (5B) β fits in ~3.5 GB, runs on 8 GB laptops
- Gemma 4 E4B (8B) β ~5.5 GB, the new sweet-spot for Pro tier
- Gemma 4 26B MoE (4B active) β big-model quality, efficient inference
- Gemma 4 31B Dense β top-tier quality for 24 GB+ GPUs
TurboQuant KV-cache compression on every model:
- 3.8Γ compression at 4-bit (cosine similarity 0.9986 β effectively lossless)
- 5.0Γ compression at 3-bit
- 7.3Γ compression at 2-bit for extreme memory savings
- Unlocks long-context editing sessions (32Kβ131K tokens) on consumer hardware
Hardware-aware auto-selection β OpenCutAI detects your RAM/VRAM and picks the largest Gemma model that'll actually run smoothly. No guesswork.
Served through both Ollama (for simple local use) and our TurboQuant service
Why this matters:
Local video AI has always been a RAM problem. An 8B multimodal model + a long edit timeline + Whisper + TTS used to blow past 16 GB easily. With TurboQuant compressing the KV cache, you can now run Gemma 4 E4B end-to-end on a MacBook with room to spare.
Try it, tear it apart, tell us what breaks π



Replies