Zac Zuo

Qwen3.5 Small - 0.8B-9B native multimodal w/ more intelligence, less compute

byβ€’
Qwen just released the Qwen3.5 Small Model Series β€” 0.8B, 2B, 4B and 9B. Native multimodal with improved architecture and scaled RL. 0.8B and 2B are tiny and fast for edge devices, 4B makes a strong lightweight agent base, and 9B is already closing the gap with much larger models. Base versions released too.

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

The Qwen team just dropped the Qwen3.5 Small Model Series: 0.8B, 2B, 4B, and 9B, along with their Base versions.

This release fills the missing piece for on-device deployment and completes the full Qwen3.5 matrix from 0.8B all the way to 397B. Now you have clear choices:

  • 0.8B/2B for embedded/IoT/Mobile

  • 4B for lightweight multimodal agents

  • 9B for edge servers

  • Plus the bigger MoE models for heavier workloads.

The 9B is the real shocker: matching or beating GPT-OSS-120B on several key benchmarks while being 13x smaller.

Even Elon chimed in:

Edge AI is heating up fast. This opens up exciting new opportunities for AI hardware and local innovation.

Play with these models on @Ollama!

Vadim Ermolin

@zaczuoΒ Impressive release! Already played with all the small series both locally (MLX) and in the cloud. Now that's something that can be reliably and constantly used in agentic workflows!

Kimberly Ross

@binyuan_hui @chen_cheng1 @junyang_lin Hi guys. Can non-technical developers use these models easily? What tooling or platforms do they support?

Zac Zuo

@kimberly_rossΒ Try them on Locally AI :)

Letian Wang

The 9B punching above its weight class is the real story here. Running capable models locally without needing a data center changes what's possible for privacy-conscious apps and edge deployments. Been waiting for small open-source models to close this gap.

Jeongki Park

Native multimodal at 0.8B is genuinely impressive - most teams trade off size for capability, but 262K context windows + text/image/video in under 1B parameters changes the edge deployment math.

The 9B beating GPT-OSS-20B on GPQA Diamond is interesting. Curious about structured output reliability at 0.8B though - small models tend to drop JSON schema adherence under complex instructions. Is there a differentiated training approach for structured data tasks?

MTP for faster inference on constrained hardware is a smart addition. Real-world throughput numbers on consumer GPU vs Apple Silicon would help developers size their deployment targets.

Sam Chen
Hey congratulations on the launch!
ERHAN GÜNEY

What kinds of tasks can be performed with these models?

Bhavin Sheth

The 9B matching models that size is wild β€” smaller models getting this good makes edge + local AI way more practical than most people realize.

Wisnu

may i ask which models the best for unity game engine problem solving?

Qi Wang

The benchmarks on the 9B model are seriously wild - matching models 13x its size is no small feat, and having the full range from 0.8B to 9B gives developers real flexibility for edge deployment. Curious though: how does the multimodal performance hold up on the smaller 0.8B and 2B variants compared to the 9B?

Phon

Really cool to see Qwen3 pushing open-source AI forward; the hybrid reasoning + fast response approach is super interesting. Curious to know, what kinds of real-world applications or agents are you most excited to see people build with it?

12
Next
Last