Abhishek Sira Chandrashekar

OpenCut-AI now supports Google Gemma 4 locally, with TurboQuant KV-cache compression engine.

Hey Hunters πŸ‘‹

We just shipped Google Gemma 4 support, paired with our TurboQuant KV-cache compression engine. That means you can now run Google's any-to-any multimodal models directly inside your editor β€” no API keys, no cloud, no data leaving your machine.

What's new in this drop:

Full Gemma 4 family wired into the hardware-aware model registry:
- Gemma 4 E2B (5B) β€” fits in ~3.5 GB, runs on 8 GB laptops
- Gemma 4 E4B (8B) β€” ~5.5 GB, the new sweet-spot for Pro tier
- Gemma 4 26B MoE (4B active) β€” big-model quality, efficient inference
- Gemma 4 31B Dense β€” top-tier quality for 24 GB+ GPUs

TurboQuant KV-cache compression on every model:
- 3.8Γ— compression at 4-bit (cosine similarity 0.9986 β€” effectively lossless)
- 5.0Γ— compression at 3-bit
- 7.3Γ— compression at 2-bit for extreme memory savings
- Unlocks long-context editing sessions (32K–131K tokens) on consumer hardware

Hardware-aware auto-selection β€” OpenCutAI detects your RAM/VRAM and picks the largest Gemma model that'll actually run smoothly. No guesswork.

Served through both Ollama (for simple local use) and our TurboQuant service

Why this matters:
Local video AI has always been a RAM problem. An 8B multimodal model + a long edit timeline + Whisper + TTS used to blow past 16 GB easily. With TurboQuant compressing the KV cache, you can now run Gemma 4 E4B end-to-end on a MacBook with room to spare.

Try it, tear it apart, tell us what breaks πŸ™

35 views

Add a comment

Replies

Be the first to comment