Spaces:
Running
on
CPU Upgrade
title: Speaches
colorFrom: yellow
colorTo: pink
sdk: docker
app_port: 8000
suggested_hardware: t4-small
preload_from_hub:
- Systran/faster-distil-whisper-large-v3
- Systran/faster-distil-whisper-small.en
- Systran/faster-whisper-large-v3
- Systran/faster-whisper-medium.en
- Systran/faster-whisper-small
- Systran/faster-whisper-small.en
- Systran/faster-whisper-tiny.en
- rhasspy/piper-voices
- hexgrad/Kokoro-82M
git remote add huggingface-space https://huggingface.co./spaces/speaches-ai/speaches
git push --force huggingface-space huggingface-space:main
TODO: Configure environment variables. See this.
This project was previously named
faster-whisper-server
. I've decided to change the name fromfaster-whisper-server
, as the project has evolved to support more than just transcription.
Speaches
speaches
is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.
Try it out on the HuggingFace Space
See the documentation for installation instructions and usage: https://speaches-ai.github.io/speaches/
Features:
GPU and CPU support.
OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with
speaches
.Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- LocalAgreement2 (paper | original implementation) algorithm is used for live transcription.
Live transcription support (audio is sent via websocket as it's generated).
Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
Text-to-Speech via
kokoro
(Ranked #1 in the TTS Arena) andpiper
models.Coming soon: Audio generation (chat completions endpoint) | OpenAI Documentation
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
- Async speech to speech interactions with a model (audio in, audio out)
Coming soon: Realtime API | OpenAI Documentation
Please create an issue if you find a bug, have a question, or a feature suggestion.
Demo
Streaming Transcription
TODO
Speech Generation
https://github.com/user-attachments/assets/0021acd9-f480-4bc3-904d-831f54c4d45b
Live Transcription (using WebSockets)
https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f