Think Deeper or Act Faster

Start new thread

Qwen3.5 Small - 0.8B-9B native multimodal w/ more intelligence, less compute

Flowtica Scribe

•9d ago

Qwen just released the Qwen3.5 Small Model Series — 0.8B, 2B, 4B and 9B. Native multimodal with improved architecture and scaled RL. 0.8B and 2B are tiny and fast for edge devices, 4B makes a strong lightweight agent base, and 9B is already closing the gap with much larger models. Base versions released too.

Replies

Best

Flowtica Scribe

Hunter

📌

Hi everyone!

The Qwen team just dropped the Qwen3.5 Small Model Series: 0.8B, 2B, 4B, and 9B, along with their Base versions.

This release fills the missing piece for on-device deployment and completes the full Qwen3.5 matrix from 0.8B all the way to 397B. Now you have clear choices:

0.8B/2B for embedded/IoT/Mobile
4B for lightweight multimodal agents
9B for edge servers
Plus the bigger MoE models for heavier workloads.

The 9B is the real shocker: matching or beating GPT-OSS-120B on several key benchmarks while being 13x smaller.

Even Elon chimed in:

Edge AI is heating up fast. This opens up exciting new opportunities for AI hardware and local innovation.

Play with these models on @Ollama!

Report

9d ago

Fluent

@zaczuo Impressive release! Already played with all the small series both locally (MLX) and in the cloud. Now that's something that can be reliably and constantly used in agentic workflows!

Report

9d ago

@binyuan_hui @chen_cheng1 @junyang_lin Hi guys. Can non-technical developers use these models easily? What tooling or platforms do they support?

Report

9d ago

Flowtica Scribe

Hunter

@kimberly_ross Try them on Locally AI :)

Report

9d ago

The 9B punching above its weight class is the real story here. Running capable models locally without needing a data center changes what's possible for privacy-conscious apps and edge deployments. Been waiting for small open-source models to close this gap.

Report

9d ago

Native multimodal at 0.8B is genuinely impressive - most teams trade off size for capability, but 262K context windows + text/image/video in under 1B parameters changes the edge deployment math.

The 9B beating GPT-OSS-20B on GPQA Diamond is interesting. Curious about structured output reliability at 0.8B though - small models tend to drop JSON schema adherence under complex instructions. Is there a differentiated training approach for structured data tasks?

MTP for faster inference on constrained hardware is a smart addition. Real-world throughput numbers on consumer GPU vs Apple Silicon would help developers size their deployment targets.

Report

9d ago

Hey congratulations on the launch!

Report

9d ago

What kinds of tasks can be performed with these models?

Report

9d ago

The 9B matching models that size is wild — smaller models getting this good makes edge + local AI way more practical than most people realize.

Report

9d ago

may i ask which models the best for unity game engine problem solving?

Report

8d ago

The benchmarks on the 9B model are seriously wild - matching models 13x its size is no small feat, and having the full range from 0.8B to 9B gives developers real flexibility for edge deployment. Curious though: how does the multimodal performance hold up on the smaller 0.8B and 2B variants compared to the 9B?

Report

8d ago

Really cool to see Qwen3 pushing open-source AI forward; the hybrid reasoning + fast response approach is super interesting. Curious to know, what kinds of real-world applications or agents are you most excited to see people build with it?

Report

8d ago

1 2