|
--- |
|
tags: |
|
- audio-to-audio |
|
- text-to-speech |
|
- speech-to-text |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- zh |
|
- en |
|
- de |
|
- ja |
|
- fr |
|
- es |
|
- ko |
|
- ar |
|
pipeline_tag: audio-to-audio |
|
inference: false |
|
extra_gated_prompt: >- |
|
You agree to not use the model to generate contents that violate DMCA or local |
|
laws. |
|
extra_gated_fields: |
|
Country: country |
|
Specific date: date_picker |
|
I agree to use this model for non-commercial use ONLY: checkbox |
|
--- |
|
|
|
# Fish Agent V0.1 3B |
|
|
|
**Fish Agent V0.1 3B** is a groundbreaking Voice-to-Voice model capable of capturing and generating environmental audio information with unprecedented accuracy. What sets it apart is its semantic-token-free architecture, eliminating the need for traditional semantic encoders/decoders like Whisper and CosyVoice. |
|
|
|
Additionally, it stands as a state-of-the-art text-to-speech (TTS) model, trained on an extensive dataset of 700,000 hours of multilingual audio content. |
|
|
|
This model is a continue-pretrained version of Qwen-2.5-3B-Instruct for 200B voice & text tokens. |
|
|
|
## Supported Languages |
|
The model supports the following languages with their respective training data sizes: |
|
- English (en): ~300,000 hours |
|
- Chinese (zh): ~300,000 hours |
|
- German (de): ~20,000 hours |
|
- Japanese (ja): ~20,000 hours |
|
- French (fr): ~20,000 hours |
|
- Spanish (es): ~20,000 hours |
|
- Korean (ko): ~20,000 hours |
|
- Arabic (ar): ~20,000 hours |
|
|
|
For detailed information and implementation guidelines, please visit our [Fish Speech GitHub repository](https://github.com/fishaudio/fish-speech). |
|
|
|
## Citation |
|
If you find this repository helpful in your work, please consider citing: |
|
|
|
```bibtex |
|
@misc{fish-agent-0.1, |
|
author = {Shijia Liao and Tianyu Li and Rcell and others}, |
|
title = {Fish Agent V0.1 3B}, |
|
year = {2024}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\url{https://github.com/fishaudio/fish-speech}} |
|
} |
|
``` |
|
|
|
## License |
|
This model and its associated code are released under the BY-CC-NC-SA-4.0 license, allowing for non-commercial use with appropriate attribution. |
|
|