Soundwave / README.md
puccho's picture
Update README.md
e869b56 verified
metadata
language:
  - en
tags:
  - audio-text-to-text
  - speech-translation
  - speech-understanding
  - audio
  - chat
license: apache-2.0
datasets:
  - custom
metrics:
  - wer
  - bleu
  - AIR-Bench

Soundwave: Less is More for Speech-Text Alignment in LLMs

πŸˆβ€β¬› Github ο½œ πŸ“ƒ Paper| πŸ“Ό Online Demo 

Model Description

Soundwave is a Speech-to-Text model that bridges the gap between speech and text. It is trained on just 10k hours of data and delivers exceptional performance in speech translation and AIR-Bench speech tasks.

Key Features

  • A Speech-to-Text Model Bridging the Gap Between Speech and Text
  • Utilizes Data-Efficient Strategy and Unique Architecture, Trained on Only 10k Hours of Data
  • Exceptional Performance in Speech Translation and AIR-Bench Speech Tasks
  • Retains Intelligence During Conversations, Ideal for Interactive Tasks

Usage

Load the Soundwave model and run inference with your audio files as shown in the GitHub repository.

πŸ“– Citation

@article{zhang2025soundwave,
  title={Soundwave: Less is More for Speech-Text Alignment in LLMs},
  author={Zhang, Yuhao and Liu, Zhiheng and Bu, Fan and Zhang, Ruiyu and Wang, Benyou and Li, Haizhou},
  journal={arXiv preprint arXiv:2502.12900},
  year={2025}
}