speaches / README.md
Fedir Zadniprovskyi
feat: HuggingFace Space configuration
1059dbe
metadata
title: Speaches
colorFrom: yellow
colorTo: pink
sdk: docker
app_port: 8000
suggested_hardware: t4-small
preload_from_hub:
  - Systran/faster-distil-whisper-large-v3
  - Systran/faster-distil-whisper-small.en
  - Systran/faster-whisper-large-v3
  - Systran/faster-whisper-medium.en
  - Systran/faster-whisper-small
  - Systran/faster-whisper-small.en
  - Systran/faster-whisper-tiny.en
  - rhasspy/piper-voices
  - hexgrad/Kokoro-82M
git remote add huggingface-space https://huggingface.co./spaces/speaches-ai/speaches
git push --force huggingface-space huggingface-space:main

TODO: Configure environment variables. See this.

This project was previously named faster-whisper-server. I've decided to change the name from faster-whisper-server, as the project has evolved to support more than just transcription.

Speaches

speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.

Try it out on the HuggingFace Space

See the documentation for installation instructions and usage: https://speaches-ai.github.io/speaches/

Features:

  • GPU and CPU support.

  • Deployable via Docker Compose / Docker

  • Highly configurable

  • OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with speaches.

  • Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).

  • Live transcription support (audio is sent via websocket as it's generated).

  • Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.

  • Text-to-Speech via kokoro(Ranked #1 in the TTS Arena) and piper models.

  • Coming soon: Audio generation (chat completions endpoint) | OpenAI Documentation

    • Generate a spoken audio summary of a body of text (text in, audio out)
    • Perform sentiment analysis on a recording (audio in, text out)
    • Async speech to speech interactions with a model (audio in, audio out)
  • Coming soon: Realtime API | OpenAI Documentation

Please create an issue if you find a bug, have a question, or a feature suggestion.

Demo

Streaming Transcription

TODO

Speech Generation

https://github.com/user-attachments/assets/0021acd9-f480-4bc3-904d-831f54c4d45b

Live Transcription (using WebSockets)

https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f