stepfun-ai
/

Step-Audio-Tokenizer

Model card Files Files and versions Community

Step-Audio-Tokenizer / README.md

buyun's picture

Update README.md (#1)

4eb45cc verified 12 days ago

|

history blame contribute delete

928 Bytes

	---
	license: apache-2.0
	---
	# Step-Audio-Tokenizer


	Step-Audio LLM is the industry’s first 130-billion parameter hu-manlike unified end-to-end model that integrates multimodal speech un-derstanding and generation capabilities, including singing voice synthesis, tool utilization, role-play and multilingual/dialectal comprehension and synthesis.

	This repository provides the speech tokenizer component of Step-Audio LLM. For linguistic tokenization, we utilize the output from the Paraformer encoder, which is quantized into discrete representations at a token rate of 16.7 Hz. For semantic tokenization, we employ CosyVoice’s tokenizer, specifically designed to efficiently encode features essential for generating natural and expressive speech outputs, operating at a token rate of 25 Hz.

	## More information
	For more information, please refer to our repository: [Step-Audio](https://github.com/stepfun-ai/Step-Audio).