Sokuji — Privacy-first AI voice translator that runs right in your browser, no app install needed
Hi Product Hunt 👋
I'm building Sokuji, an open-source real-time AI voice translation tool. Speak your language in a video call, and your words are translated and spoken in the other person's language — live.
Why Sokuji?
🔒 Privacy-first, local-first.
Sokuji can run the entire speech recognition → translation → text-to-speech pipeline 100% locally in your browser using WebAssembly and WebGPU. No cloud, no API keys, no data leaving your device — ever. Your conversations stay yours. Cloud providers (OpenAI, Gemini, etc.) are available as options, not requirements.
🧩 Just install a browser extension. That's it.
No desktop app download, no signup, no setup wizard. Install the Chrome/Edge extension, open your Google Meet / Teams / Zoom call, and start translating. It works in the browser sidebar — right where your meeting already is. (A desktop app with system audio capture is also available for power users, but most people won't need it.)
---
What it can do
🧠 Offline local inference
- 12+ ASR models (SenseVoice, Whisper WebGPU, Paraformer, Parakeet TDT...)
- 49 Opus-MT translation pairs + Qwen 3.5 for multilingual translation
- Multiple TTS engines (Piper, Matcha, MeloTTS)
- Smart model variant selection based on your GPU capabilities
🎙️ 7+ cloud providers (optional)
OpenAI Realtime (GA, WebRTC), Google Gemini, Volcengine (ByteDance), Palabra AI, any OpenAI-compatible endpoint, or our hosted Kizuna AI option. Bring your own API key if you want lower latency or higher quality.
🔊 Modern audio pipeline
AudioWorklet-based low-latency processing, built-in noise suppression, dynamic device switching mid-session, cross-platform system audio capture, virtual audio devices on macOS/Windows.
🌐 35+ UI languages — the app itself speaks your language too.
Other highlights: text input translation, karaoke-style TTS highlighting, Push-to-Talk, auto-updater (desktop), Stripe token wallet for hosted API.
---
What's next
The local-first approach is working, but we want to push it much further:
- TranslateGemma 4B — Google's 55+ language any-to-any translation model via WebGPU, replacing current English-pivot Opus-MT limitation ([#123](https://github.com/kizuna-ai-lab...))
- Voxtral Mini 4B — Mistral's 13-language real-time streaming ASR, <500ms latency ([#125](https://github.com/kizuna-ai-lab...))
- Native inference in Electron — bypass WASM overhead, unlock GPU acceleration for significantly faster local performance ([#129](https://github.com/kizuna-ai-lab...))
- More meeting platforms: Webex, Jitsi, GoTo, RingCentral
- Windows & macOS code signing, Linux AppImage + Flatpak
---
Links
- 🔗 Website: https://sokuji.kizuna.ai
- 💻 GitHub: https://github.com/kizuna-ai-lab/sokuji
- 🧩 Chrome Web Store: https://chromewebstore.google.com/detail/ppmihnhelgfpjomhjhpecobloelicnak?utm_source=item-share-cb
- 📦 Desktop download (optional): https://github.com/kizuna-ai-lab/sokuji/releases
Built by Kizuna AI Lab. Feedback, issues, and PRs welcome. If you deal with multilingual meetings or care about keeping your conversations private, I'd love to hear from you.


Replies