Spaces:
Sleeping
Audio Transcription App
This application leverages Hugging Face's facebook/wav2vec2-large-xlsr-53-spanish
model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription.
Features
- Automatic Speech Recognition (ASR): Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions.
- Supports Long Audios: Automatically splits long audio files into smaller chunks for processing.
- Web Interface: Provides a user-friendly interface using Gradio.
- Flexible Audio Upload: Accepts common audio formats like WAV and MP3.
Installation
1. Clone the Repository
git clone <repository_url>
cd <repository_folder>
2. Install Dependencies
Ensure you have Python 3.7 or higher installed. Then, run:
pip install -r requirements.txt
3. Install FFmpeg
This application uses pydub
, which requires FFmpeg. Follow these steps to install it:
- Windows: Download FFmpeg from https://ffmpeg.org/download.html. Add the
bin
directory to your PATH. - MacOS: Use Homebrew:
brew install ffmpeg
- Linux: Install via your package manager, e.g.,
sudo apt install ffmpeg
Usage
1. Run the Application
Start the app by running:
python app.py
2. Open the Web Interface
Once the app starts, it will provide a local URL (e.g., http://127.0.0.1:7860/
). Open this URL in your web browser.
3. Upload an Audio File
- Click on the upload button to select an audio file.
- Supported formats: WAV, MP3, etc.
4. Get the Transcription
- Once the audio is processed, the transcription will appear in the text box.
- You can copy the transcription for further use.
File Structure
app.py
: Main application script.requirements.txt
: List of dependencies.chunks/
: Temporary folder where audio chunks are stored during processing.transcripcion.txt
: File where the full transcription is saved after processing.
Customization
Adjust Chunk Length
The default chunk length is set to 30 seconds. You can adjust this by modifying the chunk_length_ms
parameter in the app.py
file:
chunk_length_ms = 30000 # Change to desired length in milliseconds
Limitations
- Language: The model is optimized for Spanish audio. Performance may vary with other languages.
- Audio Quality: Poor-quality audio may result in less accurate transcriptions.
- Performance: Processing very large files may take some time, depending on your system.
Dependencies
transformers
: For ASR model.torch
: Backend for model computations.pydub
: For audio splitting.ffmpeg
: Required bypydub
for audio processing.gradio
: To create the web interface.
Install them using:
pip install -r requirements.txt
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- Hugging Face for providing pre-trained models.
- Gradio for the simple interface framework.
- Pydub and FFmpeg for audio processing tools.