# Audio Transcription App

This application leverages Hugging Face's `facebook/wav2vec2-large-xlsr-53-spanish` model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription.

---

## Features
- **Automatic Speech Recognition (ASR):** Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions.
- **Supports Long Audios:** Automatically splits long audio files into smaller chunks for processing.
- **Web Interface:** Provides a user-friendly interface using Gradio.
- **Flexible Audio Upload:** Accepts common audio formats like WAV and MP3.

---

## Installation

### 1. Clone the Repository
```bash
git clone <repository_url>
cd <repository_folder>
```

### 2. Install Dependencies
Ensure you have Python 3.7 or higher installed. Then, run:
```bash
pip install -r requirements.txt
```

### 3. Install FFmpeg
This application uses `pydub`, which requires FFmpeg. Follow these steps to install it:
- **Windows:** Download FFmpeg from [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html). Add the `bin` directory to your PATH.
- **MacOS:** Use Homebrew:
  ```bash
  brew install ffmpeg
  ```
- **Linux:** Install via your package manager, e.g.,
  ```bash
  sudo apt install ffmpeg
  ```

---

## Usage

### 1. Run the Application
Start the app by running:
```bash
python app.py
```

### 2. Open the Web Interface
Once the app starts, it will provide a local URL (e.g., `http://127.0.0.1:7860/`). Open this URL in your web browser.

### 3. Upload an Audio File
- Click on the upload button to select an audio file.
- Supported formats: WAV, MP3, etc.

### 4. Get the Transcription
- Once the audio is processed, the transcription will appear in the text box.
- You can copy the transcription for further use.

---

## File Structure
- `app.py`: Main application script.
- `requirements.txt`: List of dependencies.
- `chunks/`: Temporary folder where audio chunks are stored during processing.
- `transcripcion.txt`: File where the full transcription is saved after processing.

---

## Customization

### Adjust Chunk Length
The default chunk length is set to 30 seconds. You can adjust this by modifying the `chunk_length_ms` parameter in the `app.py` file:
```python
chunk_length_ms = 30000  # Change to desired length in milliseconds
```

---

## Limitations
- **Language:** The model is optimized for Spanish audio. Performance may vary with other languages.
- **Audio Quality:** Poor-quality audio may result in less accurate transcriptions.
- **Performance:** Processing very large files may take some time, depending on your system.

---

## Dependencies
- `transformers`: For ASR model.
- `torch`: Backend for model computations.
- `pydub`: For audio splitting.
- `ffmpeg`: Required by `pydub` for audio processing.
- `gradio`: To create the web interface.

Install them using:
```bash
pip install -r requirements.txt
```

---

## License
This project is licensed under the MIT License. See the LICENSE file for details.

---

## Acknowledgments
- Hugging Face for providing pre-trained models.
- Gradio for the simple interface framework.
- Pydub and FFmpeg for audio processing tools.