FreedomIntelligence
/

Soundwave

Audio-Text-to-Text

speech-translation

speech-understanding

Model card Files Files and versions Community

Soundwave / README.md

puccho's picture

Update README.md

f53aa47 verified 6 days ago

|

1.75 kB

	---
	language:
	- en
	tags:
	- speech-to-text
	- speech-translation
	- conversational-AI
	- speech-understanding
	- whisper
	license: apache-2.0
	datasets:
	- custom
	metrics:
	- wer
	- bleu
	- AIR-Bench
	---

	# Soundwave: Less is More for Speech-Text Alignment in LLMs
	<p align="center">
	<font size="3"><a href="https://github.com/FreedomIntelligence/Soundwave">🐈‍⬛ Github</a>&nbsp｜&nbsp<a href="https://arxiv.org/abs/2502.12900">📃 Paper</a>｜&nbsp<a href="https://huggingface.co./spaces/FreedomIntelligence/SoundwaveDemo">📼 Online Demo</a>&nbsp</font>
	</p>

	## Model Description
	Soundwave is a Speech-to-Text model that bridges the gap between speech and text. It is trained on just 10k hours of data and delivers exceptional performance in speech translation and AIR-Bench speech tasks.

	### Key Features
	<div>
	<ul>
	<font size="3"><li>A Speech-to-Text Model Bridging the Gap Between Speech and Text</li></font>
	<font size="3"><li>Utilizes Data-Efficient Strategy and Unique Architecture, Trained on Only 10k Hours of Data</li></font>
	<font size="3"><li>Exceptional Performance in Speech Translation and AIR-Bench Speech Tasks</li></font>
	<font size="3"><li>Retains Intelligence During Conversations, Ideal for Interactive Tasks</li></font>
	</ul>
	</div>

	## Usage
	Load the Soundwave model and run inference with your audio files as shown in the <a href="https://github.com/FreedomIntelligence/Soundwave">GitHub repository</a>.

	# <span>📖 Citation</span>
	```
	@article{zhang2025soundwave,
	title={Soundwave: Less is More for Speech-Text Alignment in LLMs},
	author={Zhang, Yuhao and Liu, Zhiheng and Bu, Fan and Zhang, Ruiyu and Wang, Benyou and Li, Haizhou},
	journal={arXiv preprint arXiv:2502.12900},
	year={2025}
	}
	```