Zac Zuo

Qwen3.5-Omni - A native omni model for voice, video, and tools

Qwen3.5-Omni is Qwen"s new native omni model for text, images, audio, and video, with stronger multilingual speech, realtime voice interaction, web search, function calling, voice cloning, and long-context audio/video understanding.

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

Qwen3.5-Omni is the latest native omni model from the Qwen family. It handles text, images, audio, and video in one system, pushes hard on multilingual speech, and adds a lot of the interaction stuff that actually matters in practice: semantic interruption, realtime voice control, WebSearch, Function Calling, and voice cloning. The audio/video captioning and "audio-visual vibe coding" angle is especially wild.

It is not open-sourced yet. Right now, the way to try it is through the Hugging Face offline or online demos, or through the official API.

Would love to see this land in the Coding Plan soon!