license: mit
language:
- en
base_model: Qwen/Qwen2-0.5B
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
π€ Hugging Face | π Github | π Technical report
This is a safetensors conversion of gpt-omni/mini-omni
.
Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Features
β Real-time speech-to-speech conversational capabilities. No extra ASR or TTS models required.
β Talking while thinking, with the ability to generate text and audio at the same time.
β Streaming audio outupt capabilities.
β With "Audio-to-Text" and "Audio-to-Audio" batch inference to further boost the performance.
NOTE: please refer to https://github.com/gpt-omni/mini-omni for more details.