fish-agent-v0.1-3b / README.md
lengyue233's picture
Update README.md
7db8a79 verified
metadata
tags:
  - audio-to-audio
  - text-to-speech
  - speech-to-text
license: cc-by-nc-sa-4.0
language:
  - zh
  - en
  - de
  - ja
  - fr
  - es
  - ko
  - ar
pipeline_tag: audio-to-audio
inference: false
extra_gated_prompt: >-
  You agree to not use the model to generate contents that violate DMCA or local
  laws.
extra_gated_fields:
  Country: country
  Specific date: date_picker
  I agree to use this model for non-commercial use ONLY: checkbox

Fish Agent V0.1 3B

Fish Agent V0.1 3B is a groundbreaking Voice-to-Voice model capable of capturing and generating environmental audio information with unprecedented accuracy. What sets it apart is its semantic-token-free architecture, eliminating the need for traditional semantic encoders/decoders like Whisper and CosyVoice.

Additionally, it stands as a state-of-the-art text-to-speech (TTS) model, trained on an extensive dataset of 700,000 hours of multilingual audio content.

This model is a continue-pretrained version of Qwen-2.5-3B-Instruct for 200B voice & text tokens.

Supported Languages

The model supports the following languages with their respective training data sizes:

  • English (en): ~300,000 hours
  • Chinese (zh): ~300,000 hours
  • German (de): ~20,000 hours
  • Japanese (ja): ~20,000 hours
  • French (fr): ~20,000 hours
  • Spanish (es): ~20,000 hours
  • Korean (ko): ~20,000 hours
  • Arabic (ar): ~20,000 hours

For detailed information and implementation guidelines, please visit our Fish Speech GitHub repository.

Citation

If you find this repository helpful in your work, please consider citing:

@misc{fish-agent-0.1,
    author = {Shijia Liao and Tianyu Li and Rcell and others},
    title = {Fish Agent V0.1 3B},
    year = {2024},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/fishaudio/fish-speech}}
}

License

This model and its associated code are released under the BY-CC-NC-SA-4.0 license, allowing for non-commercial use with appropriate attribution.