Zac Zuo

Qwen3-Omni - Native end-to-end multilingual omni-modal LLM

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

The native multimodal model from the Qwen3 series is here. My main focus has been on native voice capabilities, and this model is very impressive.

According to the official benchmarks, its performance in ASR, audio understanding, and voice conversation is on par with Google's Gemini 2.5 Pro. It also supports 119 languages.

You can experience the model's capabilities right now on Qwen Chat by enabling the voice (or video) mode.

sally wang

Whether I’m using it on my phone or tablet, the app adapts perfectly to different screen sizes—no awkward formatting issues at all.