Qwen3-ASR - High-accuracy ASR with flexible contextual biasing

Qwen3-ASR is a new high-accuracy speech recognition model. It supports 11 languages, excels at transcribing songs with background music, and features a unique contextual biasing system that accepts any text format to improve accuracy on specific terms.

Hi everyone! The Qwen team has released their new speech recognition model, Qwen3-ASR. It's not an open-source model (yet), but something really interesting I found during my testing is that it can recognize and transcribe music. Most ASRs treat music as noise to be filtered out, so this is a fascinating capability. It points to a subtle shift I've been noticing in ASR models. They're evolving beyond just transcribing words and are now starting to perceive the environment and the subtle emotions within the speech itself. I'm really excited about this because it's a clear sign that new product possibilities are on the horizon.

Qwen3-ASR - High-accuracy ASR with flexible contextual biasing

Replies