Qwen3.5 Small - 0.8B-9B native multimodal w/ more intelligence, less compute
byβ’
Qwen just released the Qwen3.5 Small Model Series β 0.8B, 2B, 4B and 9B. Native multimodal with improved architecture and scaled RL. 0.8B and 2B are tiny and fast for edge devices, 4B makes a strong lightweight agent base, and 9B is already closing the gap with much larger models. Base versions released too.



Replies
Flowtica Scribe
Hi everyone!
The Qwen team just dropped the Qwen3.5 Small Model Series: 0.8B, 2B, 4B, and 9B, along with their Base versions.
This release fills the missing piece for on-device deployment and completes the full Qwen3.5 matrix from 0.8B all the way to 397B. Now you have clear choices:
0.8B/2B for embedded/IoT/Mobile
4B for lightweight multimodal agents
9B for edge servers
Plus the bigger MoE models for heavier workloads.
The 9B is the real shocker: matching or beating GPT-OSS-120B on several key benchmarks while being 13x smaller.
Even Elon chimed in:
Edge AI is heating up fast. This opens up exciting new opportunities for AI hardware and local innovation.
Play with these models on @Ollama!
Fluent
@zaczuoΒ Impressive release! Already played with all the small series both locally (MLX) and in the cloud. Now that's something that can be reliably and constantly used in agentic workflows!
@binyuan_hui @chen_cheng1 @junyang_lin Hi guys. Can non-technical developers use these models easily? What tooling or platforms do they support?
Flowtica Scribe
@kimberly_rossΒ Try them on Locally AI :)
The 9B punching above its weight class is the real story here. Running capable models locally without needing a data center changes what's possible for privacy-conscious apps and edge deployments. Been waiting for small open-source models to close this gap.
Native multimodal at 0.8B is genuinely impressive - most teams trade off size for capability, but 262K context windows + text/image/video in under 1B parameters changes the edge deployment math.
The 9B beating GPT-OSS-20B on GPQA Diamond is interesting. Curious about structured output reliability at 0.8B though - small models tend to drop JSON schema adherence under complex instructions. Is there a differentiated training approach for structured data tasks?
MTP for faster inference on constrained hardware is a smart addition. Real-world throughput numbers on consumer GPU vs Apple Silicon would help developers size their deployment targets.
What kinds of tasks can be performed with these models?
The 9B matching models that size is wild β smaller models getting this good makes edge + local AI way more practical than most people realize.
may i ask which models the best for unity game engine problem solving?
The benchmarks on the 9B model are seriously wild - matching models 13x its size is no small feat, and having the full range from 0.8B to 9B gives developers real flexibility for edge deployment. Curious though: how does the multimodal performance hold up on the smaller 0.8B and 2B variants compared to the 9B?
Really cool to see Qwen3 pushing open-source AI forward; the hybrid reasoning + fast response approach is super interesting. Curious to know, what kinds of real-world applications or agents are you most excited to see people build with it?