---
language:
- en
tags:
- speech-to-text
- speech-translation
- conversational-AI
- speech-understanding
- whisper
license: apache-2.0
datasets:
- custom
metrics:
- wer
- bleu
- AIR-Bench
---
# Soundwave: Less is More for Speech-Text Alignment in LLMs
🐈⬛ Github | 📃 Paper| 📼 Online Demo 
## Model Description
Soundwave is a Speech-to-Text model that bridges the gap between speech and text. It is trained on just 10k hours of data and delivers exceptional performance in speech translation and AIR-Bench speech tasks.
### Key Features
- A Speech-to-Text Model Bridging the Gap Between Speech and Text
- Utilizes Data-Efficient Strategy and Unique Architecture, Trained on Only 10k Hours of Data
- Exceptional Performance in Speech Translation and AIR-Bench Speech Tasks
- Retains Intelligence During Conversations, Ideal for Interactive Tasks
## Usage
Load the Soundwave model and run inference with your audio files as shown in the GitHub repository.
# 📖 Citation
```
@article{zhang2025soundwave,
title={Soundwave: Less is More for Speech-Text Alignment in LLMs},
author={Zhang, Yuhao and Liu, Zhiheng and Bu, Fan and Zhang, Ruiyu and Wang, Benyou and Li, Haizhou},
journal={arXiv preprint arXiv:2502.12900},
year={2025}
}
```