Spaces:
Sleeping
Sleeping
# Audio Transcription App | |
This application leverages Hugging Face's `facebook/wav2vec2-large-xlsr-53-spanish` model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription. | |
--- | |
## Features | |
- **Automatic Speech Recognition (ASR):** Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions. | |
- **Supports Long Audios:** Automatically splits long audio files into smaller chunks for processing. | |
- **Web Interface:** Provides a user-friendly interface using Gradio. | |
- **Flexible Audio Upload:** Accepts common audio formats like WAV and MP3. | |
--- | |
## Installation | |
### 1. Clone the Repository | |
```bash | |
git clone <repository_url> | |
cd <repository_folder> | |
``` | |
### 2. Install Dependencies | |
Ensure you have Python 3.7 or higher installed. Then, run: | |
```bash | |
pip install -r requirements.txt | |
``` | |
### 3. Install FFmpeg | |
This application uses `pydub`, which requires FFmpeg. Follow these steps to install it: | |
- **Windows:** Download FFmpeg from [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html). Add the `bin` directory to your PATH. | |
- **MacOS:** Use Homebrew: | |
```bash | |
brew install ffmpeg | |
``` | |
- **Linux:** Install via your package manager, e.g., | |
```bash | |
sudo apt install ffmpeg | |
``` | |
--- | |
## Usage | |
### 1. Run the Application | |
Start the app by running: | |
```bash | |
python app.py | |
``` | |
### 2. Open the Web Interface | |
Once the app starts, it will provide a local URL (e.g., `http://127.0.0.1:7860/`). Open this URL in your web browser. | |
### 3. Upload an Audio File | |
- Click on the upload button to select an audio file. | |
- Supported formats: WAV, MP3, etc. | |
### 4. Get the Transcription | |
- Once the audio is processed, the transcription will appear in the text box. | |
- You can copy the transcription for further use. | |
--- | |
## File Structure | |
- `app.py`: Main application script. | |
- `requirements.txt`: List of dependencies. | |
- `chunks/`: Temporary folder where audio chunks are stored during processing. | |
- `transcripcion.txt`: File where the full transcription is saved after processing. | |
--- | |
## Customization | |
### Adjust Chunk Length | |
The default chunk length is set to 30 seconds. You can adjust this by modifying the `chunk_length_ms` parameter in the `app.py` file: | |
```python | |
chunk_length_ms = 30000 # Change to desired length in milliseconds | |
``` | |
--- | |
## Limitations | |
- **Language:** The model is optimized for Spanish audio. Performance may vary with other languages. | |
- **Audio Quality:** Poor-quality audio may result in less accurate transcriptions. | |
- **Performance:** Processing very large files may take some time, depending on your system. | |
--- | |
## Dependencies | |
- `transformers`: For ASR model. | |
- `torch`: Backend for model computations. | |
- `pydub`: For audio splitting. | |
- `ffmpeg`: Required by `pydub` for audio processing. | |
- `gradio`: To create the web interface. | |
Install them using: | |
```bash | |
pip install -r requirements.txt | |
``` | |
--- | |
## License | |
This project is licensed under the MIT License. See the LICENSE file for details. | |
--- | |
## Acknowledgments | |
- Hugging Face for providing pre-trained models. | |
- Gradio for the simple interface framework. | |
- Pydub and FFmpeg for audio processing tools. | |