--- language: - en tags: - speech-to-text - speech-translation - conversational-AI - speech-understanding - whisper license: apache-2.0 datasets: - custom metrics: - wer - bleu - AIR-Bench ---

# Soundwave: Less is More for Speech-Text Alignment in LLMs

🐈‍⬛ Github ｜ 📃 Paper｜ 📼 Online Demo

## Model Description Soundwave is a Speech-to-Text model that bridges the gap between speech and text. It is trained on just 10k hours of data and delivers exceptional performance in speech translation and AIR-Bench speech tasks. ### Key Features

A Speech-to-Text Model Bridging the Gap Between Speech and Text

Utilizes Data-Efficient Strategy and Unique Architecture, Trained on Only 10k Hours of Data

Exceptional Performance in Speech Translation and AIR-Bench Speech Tasks

Retains Intelligence During Conversations, Ideal for Interactive Tasks

## Usage Load the Soundwave model and run inference with your audio files as shown in the GitHub repository. # 📖 Citation ``` @article{zhang2025soundwave, title={Soundwave: Less is More for Speech-Text Alignment in LLMs}, author={Zhang, Yuhao and Liu, Zhiheng and Bu, Fan and Zhang, Ruiyu and Wang, Benyou and Li, Haizhou}, journal={arXiv preprint arXiv:2502.12900}, year={2025} } ```