Jiang zhuo

Sokuji — Privacy-first AI voice translator that runs right in your browser, no app install needed

by

Hi Product Hunt 👋

I'm building Sokuji, an open-source real-time AI voice translation tool. Speak your language in a video call, and your words are translated and spoken in the other person's language — live.

Why Sokuji?

🔒 Privacy-first, local-first.
Sokuji can run the entire speech recognition → translation → text-to-speech pipeline 100% locally in your browser using WebAssembly and WebGPU. No cloud, no API keys, no data leaving your device — ever. Your conversations stay yours. Cloud providers (OpenAI, Gemini, etc.) are available as options, not requirements.

🧩 Just install a browser extension. That's it.
No desktop app download, no signup, no setup wizard. Install the Chrome/Edge extension, open your Google Meet / Teams / Zoom call, and start translating. It works in the browser sidebar — right where your meeting already is. (A desktop app with system audio capture is also available for power users, but most people won't need it.)

---

What it can do

🧠 Offline local inference
- 12+ ASR models (SenseVoice, Whisper WebGPU, Paraformer, Parakeet TDT...)
- 49 Opus-MT translation pairs + Qwen 3.5 for multilingual translation
- Multiple TTS engines (Piper, Matcha, MeloTTS)
- Smart model variant selection based on your GPU capabilities

🎙️ 7+ cloud providers (optional)
OpenAI Realtime (GA, WebRTC), Google Gemini, Volcengine (ByteDance), Palabra AI, any OpenAI-compatible endpoint, or our hosted Kizuna AI option. Bring your own API key if you want lower latency or higher quality.

🔊 Modern audio pipeline
AudioWorklet-based low-latency processing, built-in noise suppression, dynamic device switching mid-session, cross-platform system audio capture, virtual audio devices on macOS/Windows.

🌐 35+ UI languages — the app itself speaks your language too.

Other highlights: text input translation, karaoke-style TTS highlighting, Push-to-Talk, auto-updater (desktop), Stripe token wallet for hosted API.

---

What's next

The local-first approach is working, but we want to push it much further:

- TranslateGemma 4B — Google's 55+ language any-to-any translation model via WebGPU, replacing current English-pivot Opus-MT limitation ([#123](https://github.com/kizuna-ai-lab...))
- Voxtral Mini 4B — Mistral's 13-language real-time streaming ASR, <500ms latency ([#125](https://github.com/kizuna-ai-lab...))
- Native inference in Electron — bypass WASM overhead, unlock GPU acceleration for significantly faster local performance ([#129](https://github.com/kizuna-ai-lab...))
- More meeting platforms: Webex, Jitsi, GoTo, RingCentral
- Windows & macOS code signing, Linux AppImage + Flatpak

---

Links
- 🔗 Website: https://sokuji.kizuna.ai
- 💻 GitHub: https://github.com/kizuna-ai-lab/sokuji
- 🧩 Chrome Web Store: https://chromewebstore.google.com/detail/ppmihnhelgfpjomhjhpecobloelicnak?utm_source=item-share-cb
- 📦 Desktop download (optional): https://github.com/kizuna-ai-lab/sokuji/releases

Built by Kizuna AI Lab. Feedback, issues, and PRs welcome. If you deal with multilingual meetings or care about keeping your conversations private, I'd love to hear from you.

7 views

Add a comment

Replies

Be the first to comment