|
--- |
|
language: |
|
- en |
|
tags: |
|
- speech-to-text |
|
- speech-translation |
|
- conversational-AI |
|
- speech-understanding |
|
- whisper |
|
license: apache-2.0 |
|
datasets: |
|
- custom |
|
metrics: |
|
- wer |
|
- bleu |
|
- AIR-Bench |
|
--- |
|
|
|
# Soundwave: Less is More for Speech-Text Alignment in LLMs |
|
<p align="center"> |
|
<font size="3"><a href="https://github.com/FreedomIntelligence/Soundwave">πββ¬ Github</a> ο½ <a href="https://arxiv.org/abs/2502.12900">π Paper</a>ο½ <a href="https://huggingface.co./spaces/FreedomIntelligence/SoundwaveDemo">πΌ Online Demo</a> </font> |
|
</p> |
|
|
|
## Model Description |
|
Soundwave is a Speech-to-Text model that bridges the gap between speech and text. It is trained on just 10k hours of data and delivers exceptional performance in speech translation and AIR-Bench speech tasks. |
|
|
|
### Key Features |
|
<div> |
|
<ul> |
|
<font size="3"><li>A Speech-to-Text Model Bridging the Gap Between Speech and Text</li></font> |
|
<font size="3"><li>Utilizes Data-Efficient Strategy and Unique Architecture, Trained on Only 10k Hours of Data</li></font> |
|
<font size="3"><li>Exceptional Performance in Speech Translation and AIR-Bench Speech Tasks</li></font> |
|
<font size="3"><li>Retains Intelligence During Conversations, Ideal for Interactive Tasks</li></font> |
|
</ul> |
|
</div> |
|
|
|
## Usage |
|
Load the Soundwave model and run inference with your audio files as shown in the <a href="https://github.com/FreedomIntelligence/Soundwave">GitHub repository</a>. |
|
|
|
# <span>π Citation</span> |
|
``` |
|
@article{zhang2025soundwave, |
|
title={Soundwave: Less is More for Speech-Text Alignment in LLMs}, |
|
author={Zhang, Yuhao and Liu, Zhiheng and Bu, Fan and Zhang, Ruiyu and Wang, Benyou and Li, Haizhou}, |
|
journal={arXiv preprint arXiv:2502.12900}, |
|
year={2025} |
|
} |
|
``` |