Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
The native multimodal model from the Qwen3 series is here. My main focus has been on native voice capabilities, and this model is very impressive.
According to the official benchmarks, its performance in ASR, audio understanding, and voice conversation is on par with Google's Gemini 2.5 Pro. It also supports 119 languages.
You can experience the model's capabilities right now on Qwen Chat by enabling the voice (or video) mode.
Report
Whether I’m using it on my phone or tablet, the app adapts perfectly to different screen sizes—no awkward formatting issues at all.
Replies
Flowtica Scribe
Hi everyone!
The native multimodal model from the Qwen3 series is here. My main focus has been on native voice capabilities, and this model is very impressive.
According to the official benchmarks, its performance in ASR, audio understanding, and voice conversation is on par with Google's Gemini 2.5 Pro. It also supports 119 languages.
You can experience the model's capabilities right now on Qwen Chat by enabling the voice (or video) mode.
Whether I’m using it on my phone or tablet, the app adapts perfectly to different screen sizes—no awkward formatting issues at all.