Audio Transcription App

This application leverages Hugging Face's facebook/wav2vec2-large-xlsr-53-spanish model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription.

Features

Automatic Speech Recognition (ASR): Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions.
Supports Long Audios: Automatically splits long audio files into smaller chunks for processing.
Web Interface: Provides a user-friendly interface using Gradio.
Flexible Audio Upload: Accepts common audio formats like WAV and MP3.

Installation

1. Clone the Repository

git clone <repository_url>
cd <repository_folder>

2. Install Dependencies

Ensure you have Python 3.7 or higher installed. Then, run:

pip install -r requirements.txt

3. Install FFmpeg

This application uses pydub, which requires FFmpeg. Follow these steps to install it:

Windows: Download FFmpeg from https://ffmpeg.org/download.html. Add the bin directory to your PATH.
MacOS: Use Homebrew:
```
brew install ffmpeg
```
Linux: Install via your package manager, e.g.,
```
sudo apt install ffmpeg
```

Usage

1. Run the Application

Start the app by running:

python app.py

2. Open the Web Interface

Once the app starts, it will provide a local URL (e.g., http://127.0.0.1:7860/). Open this URL in your web browser.

3. Upload an Audio File

Click on the upload button to select an audio file.
Supported formats: WAV, MP3, etc.

4. Get the Transcription

Once the audio is processed, the transcription will appear in the text box.
You can copy the transcription for further use.

File Structure

app.py: Main application script.
requirements.txt: List of dependencies.
chunks/: Temporary folder where audio chunks are stored during processing.
transcripcion.txt: File where the full transcription is saved after processing.

Customization

Adjust Chunk Length

The default chunk length is set to 30 seconds. You can adjust this by modifying the chunk_length_ms parameter in the app.py file:

chunk_length_ms = 30000  # Change to desired length in milliseconds

Limitations

Language: The model is optimized for Spanish audio. Performance may vary with other languages.
Audio Quality: Poor-quality audio may result in less accurate transcriptions.
Performance: Processing very large files may take some time, depending on your system.

Dependencies

transformers: For ASR model.
torch: Backend for model computations.
pydub: For audio splitting.
ffmpeg: Required by pydub for audio processing.
gradio: To create the web interface.

Install them using:

pip install -r requirements.txt

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Hugging Face for providing pre-trained models.
Gradio for the simple interface framework.
Pydub and FFmpeg for audio processing tools.