Qwen3.5-Omni is Qwen"s new native omni model for text, images, audio, and video, with stronger multilingual speech, realtime voice interaction, web search, function calling, voice cloning, and long-context audio/video understanding.
Qwen just released the Qwen3.5 Small Model Series — 0.8B, 2B, 4B and 9B. Native multimodal with improved architecture and scaled RL. 0.8B and 2B are tiny and fast for edge devices, 4B makes a strong lightweight agent base, and 9B is already closing the gap with much larger models. Base versions released too.
An open-weight, native vision-language model built for long-horizon agentic tasks. Its hybrid architecture (linear attention + MoE) delivers the capabilities of a 397B giant with the inference speed of a 17B model.
Qwen-Image-2512 is the new open-source SOTA for text-to-image generation. It delivers drastically improved photorealism, finer natural details, and superior text rendering.
Qwen-Image-Layered decomposes images into transparent RGBA layers, unlocking inherent editability. You can move, resize, or delete objects without artifacts. Supports recursive decomposition and variable layer counts.
A family of SOTA speech models (0.6B & 1.7B) supporting 10 languages. Features prompt-based Voice Design, 3s zero-shot cloning, and extreme low-latency streaming.
Qwen3-Coder is a new 480B MoE open model (35B active) by the Qwen team, built for agentic coding. It achieves SOTA results on benchmarks like SWE-bench, supports up to 1M context, and comes with an open-source CLI tool, Qwen Code.
Qwen3-235B-A22B-Thinking-2507 is a powerful open-source MoE model (22B active) built for deep reasoning. It achieves SOTA results on agentic tasks, supports a 256K context, and is available on Hugging Face and via API.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.