Building FluentCap: Real-time captions for any desktop audio
The problem I wanted to solve:
I work remotely with international teams. I found myself constantly missing context because people spoke too fast or had unfamiliar accents.
Existing caption apps? Most only work inside specific apps (Zoom, Teams) or browser tabs. I couldn't use them to watch back meeting recordings, foreign films, or FaceTime calls.
What I built:
FluentCap captures any audio playing on your computer — system-wide — and shows real-time transcription with optional translation. Works with Netflix, Zoom recordings, YouTube, podcasts, anything.
The BYOK approach:
Instead of a subscription model, I went with Bring Your Own Key. You connect your API key from Deepgram, Gladia, or AssemblyAI. You pay only for what you use (~$0.25/hour), and providers offer $50-200 in free credits.
Launching soon — looking for feedback:
What languages would be most useful to you?
Any specific use cases I should prioritize?
What features would make this more valuable?
Would love to hear your thoughts!
Check it out: https://fluentcap.live/


Replies