Check out Kimi-Audio from Moonshot AI, an open-source project aiming for a universal audio foundation model.
This single 7B model is designed to handle many different audio tasks, like ASR, audio Q&A, generation, sound classification, and even full speech-to-speech conversations. Kimi gave us strong performance across various benchmarks.
Importantly, they've released the model weights (Base and Instruct versions), code, and a full evaluation toolkit called Kimi-Audio-Evalkit, that's the interesting part, and it's openly for the community to use and build upon.
Replies
Flowtica Scribe
Hi everyone!
Check out Kimi-Audio from Moonshot AI, an open-source project aiming for a universal audio foundation model.
This single 7B model is designed to handle many different audio tasks, like ASR, audio Q&A, generation, sound classification, and even full speech-to-speech conversations. Kimi gave us strong performance across various benchmarks.
Importantly, they've released the model weights (Base and Instruct versions), code, and a full evaluation toolkit called Kimi-Audio-Evalkit, that's the interesting part, and it's openly for the community to use and build upon.