fishaudio
/

fish-agent-v0.1-3b

Model card Files Files and versions Community

fish-agent-v0.1-3b / README.md

lengyue233's picture

Update README.md

7db8a79 verified 13 days ago

|

history blame contribute delete

2.05 kB

	---
	tags:
	- audio-to-audio
	- text-to-speech
	- speech-to-text
	license: cc-by-nc-sa-4.0
	language:
	- zh
	- en
	- de
	- ja
	- fr
	- es
	- ko
	- ar
	pipeline_tag: audio-to-audio
	inference: false
	extra_gated_prompt: >-
	You agree to not use the model to generate contents that violate DMCA or local
	laws.
	extra_gated_fields:
	Country: country
	Specific date: date_picker
	I agree to use this model for non-commercial use ONLY: checkbox
	---

	# Fish Agent V0.1 3B

	Fish Agent V0.1 3B is a groundbreaking Voice-to-Voice model capable of capturing and generating environmental audio information with unprecedented accuracy. What sets it apart is its semantic-token-free architecture, eliminating the need for traditional semantic encoders/decoders like Whisper and CosyVoice.

	Additionally, it stands as a state-of-the-art text-to-speech (TTS) model, trained on an extensive dataset of 700,000 hours of multilingual audio content.

	This model is a continue-pretrained version of Qwen-2.5-3B-Instruct for 200B voice & text tokens.

	## Supported Languages
	The model supports the following languages with their respective training data sizes:
	- English (en): ~300,000 hours
	- Chinese (zh): ~300,000 hours
	- German (de): ~20,000 hours
	- Japanese (ja): ~20,000 hours
	- French (fr): ~20,000 hours
	- Spanish (es): ~20,000 hours
	- Korean (ko): ~20,000 hours
	- Arabic (ar): ~20,000 hours

	For detailed information and implementation guidelines, please visit our [Fish Speech GitHub repository](https://github.com/fishaudio/fish-speech).

	## Citation
	If you find this repository helpful in your work, please consider citing:

	```bibtex
	@misc{fish-agent-0.1,
	author = {Shijia Liao and Tianyu Li and Rcell and others},
	title = {Fish Agent V0.1 3B},
	year = {2024},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/fishaudio/fish-speech}}
	}
	```

	## License
	This model and its associated code are released under the BY-CC-NC-SA-4.0 license, allowing for non-commercial use with appropriate attribution.