# Audio Transcription App This application leverages Hugging Face's `facebook/wav2vec2-large-xlsr-53-spanish` model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription. --- ## Features - **Automatic Speech Recognition (ASR):** Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions. - **Supports Long Audios:** Automatically splits long audio files into smaller chunks for processing. - **Web Interface:** Provides a user-friendly interface using Gradio. - **Flexible Audio Upload:** Accepts common audio formats like WAV and MP3. --- ## Installation ### 1. Clone the Repository ```bash git clone cd ``` ### 2. Install Dependencies Ensure you have Python 3.7 or higher installed. Then, run: ```bash pip install -r requirements.txt ``` ### 3. Install FFmpeg This application uses `pydub`, which requires FFmpeg. Follow these steps to install it: - **Windows:** Download FFmpeg from [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html). Add the `bin` directory to your PATH. - **MacOS:** Use Homebrew: ```bash brew install ffmpeg ``` - **Linux:** Install via your package manager, e.g., ```bash sudo apt install ffmpeg ``` --- ## Usage ### 1. Run the Application Start the app by running: ```bash python app.py ``` ### 2. Open the Web Interface Once the app starts, it will provide a local URL (e.g., `http://127.0.0.1:7860/`). Open this URL in your web browser. ### 3. Upload an Audio File - Click on the upload button to select an audio file. - Supported formats: WAV, MP3, etc. ### 4. Get the Transcription - Once the audio is processed, the transcription will appear in the text box. - You can copy the transcription for further use. --- ## File Structure - `app.py`: Main application script. - `requirements.txt`: List of dependencies. - `chunks/`: Temporary folder where audio chunks are stored during processing. - `transcripcion.txt`: File where the full transcription is saved after processing. --- ## Customization ### Adjust Chunk Length The default chunk length is set to 30 seconds. You can adjust this by modifying the `chunk_length_ms` parameter in the `app.py` file: ```python chunk_length_ms = 30000 # Change to desired length in milliseconds ``` --- ## Limitations - **Language:** The model is optimized for Spanish audio. Performance may vary with other languages. - **Audio Quality:** Poor-quality audio may result in less accurate transcriptions. - **Performance:** Processing very large files may take some time, depending on your system. --- ## Dependencies - `transformers`: For ASR model. - `torch`: Backend for model computations. - `pydub`: For audio splitting. - `ffmpeg`: Required by `pydub` for audio processing. - `gradio`: To create the web interface. Install them using: ```bash pip install -r requirements.txt ``` --- ## License This project is licensed under the MIT License. See the LICENSE file for details. --- ## Acknowledgments - Hugging Face for providing pre-trained models. - Gradio for the simple interface framework. - Pydub and FFmpeg for audio processing tools.