gpt-realtime is OpenAI's new speech-to-speech model for production voice agents, delivering low latency and natural, expressive speech. The Realtime API is now GA, adding key features for developers like remote MCP support, image input, and SIP phone calling.
OpenAI's new gpt-realtime model is big step forward for voice agents. The key isn't just a faster model, but a shift in how it understands.
For a true voice agent to work, it needs to understand the subtle cues in our speech, the tone, the pauses, the emotion. That's what carries the real meaning. gpt-realtime is built on a voice-in, voice-out approach. It processes audio directly, without first transcribing it to text. This is the direction the field has been trying to break through.
Also great to see the Realtime API is now generally available, with practical new features for production like remote MCP server support and SIP integration.
So cool! Now companion products can integrate with the Realtime API, which is a big step forward for improving user experience. I can't wait to try out real-time conversations! @OpenAI
We at vomyra.com are using gpt-realtime but tha major challenge is with Hindi and other Indian regional languages
Report
@sumitgoel Ek chidiya aur ek bhains talaab ke paas milti hain. Bhains kehti, "Main badi hoon, mujhe sab kuch pata hai." Chidiya hansi, "Par main udh sakti hoon!" Achanak baadh aayi, chidiya ud gayi, bhains phas gayi. Siksha: gyaan aur chaturai ka akar se koi sambandh nahi hota.
Replies
Flowtica Scribe
Hi everyone!
OpenAI's new gpt-realtime model is big step forward for voice agents. The key isn't just a faster model, but a shift in how it understands.
For a true voice agent to work, it needs to understand the subtle cues in our speech, the tone, the pauses, the emotion. That's what carries the real meaning. gpt-realtime is built on a voice-in, voice-out approach. It processes audio directly, without first transcribing it to text. This is the direction the field has been trying to break through.
Also great to see the Realtime API is now generally available, with practical new features for production like remote MCP server support and SIP integration.
YouMind
So cool! Now companion products can integrate with the Realtime API, which is a big step forward for improving user experience. I can't wait to try out real-time conversations! @OpenAI
DiffSense
Voice is definitely faster than typing. Is this the end of open-landscape offices?
Fakeradar
We just have to wait a littlebit more and we can communicate with the ChatGPT right while driving, without looking at the iphone screen...
Magiclight
This looks amazing — love how you’re empowering creators to scale AI experiences.
Triforce Todos
The real test will be, can it pick up hesitation, sarcasm, or subtle emphasis? That’s where most AI agents break down.
Pretty cool update
Vomyra AI – Voice AI Agent