Spaces:

Blandskron
/

Audio-to-Text

Sleeping

App Files Files Community

Blandskron commited on Jan 15

Commit

0123ce7

verified ·

1 Parent(s): 4971271

Update README.md

Browse files

Files changed (1) hide show

README.md +110 -11

README.md CHANGED Viewed

@@ -1,14 +1,113 @@
 ---
-title: Audio To Text
-emoji: 🌖
-colorFrom: blue
-colorTo: gray
-sdk: gradio
-sdk_version: 5.12.0
-app_file: app.py
-pinned: false
-license: mit
-short_description: Educativo
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Audio Transcription App
+This application leverages Hugging Face's `facebook/wav2vec2-large-xlsr-53-spanish` model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription.
+---
+## Features
+- **Automatic Speech Recognition (ASR):** Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions.
+- **Supports Long Audios:** Automatically splits long audio files into smaller chunks for processing.
+- **Web Interface:** Provides a user-friendly interface using Gradio.
+- **Flexible Audio Upload:** Accepts common audio formats like WAV and MP3.
+---
+## Installation
+### 1. Clone the Repository
+```bash
+git clone <repository_url>
+cd <repository_folder>
+```
+### 2. Install Dependencies
+Ensure you have Python 3.7 or higher installed. Then, run:
+```bash
+pip install -r requirements.txt
+```
+### 3. Install FFmpeg
+This application uses `pydub`, which requires FFmpeg. Follow these steps to install it:
+- **Windows:** Download FFmpeg from [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html). Add the `bin` directory to your PATH.
+- **MacOS:** Use Homebrew:
+  ```bash
+  brew install ffmpeg
+  ```
+- **Linux:** Install via your package manager, e.g.,
+  ```bash
+  sudo apt install ffmpeg
+  ```
+---
+## Usage
+### 1. Run the Application
+Start the app by running:
+```bash
+python app.py
+```
+### 2. Open the Web Interface
+Once the app starts, it will provide a local URL (e.g., `http://127.0.0.1:7860/`). Open this URL in your web browser.
+### 3. Upload an Audio File
+- Click on the upload button to select an audio file.
+- Supported formats: WAV, MP3, etc.
+### 4. Get the Transcription
+- Once the audio is processed, the transcription will appear in the text box.
+- You can copy the transcription for further use.
 ---
+## File Structure
+- `app.py`: Main application script.
+- `requirements.txt`: List of dependencies.
+- `chunks/`: Temporary folder where audio chunks are stored during processing.
+- `transcripcion.txt`: File where the full transcription is saved after processing.
+---
+## Customization
+### Adjust Chunk Length
+The default chunk length is set to 30 seconds. You can adjust this by modifying the `chunk_length_ms` parameter in the `app.py` file:
+```python
+chunk_length_ms = 30000  # Change to desired length in milliseconds
+```
+---
+## Limitations
+- **Language:** The model is optimized for Spanish audio. Performance may vary with other languages.
+- **Audio Quality:** Poor-quality audio may result in less accurate transcriptions.
+- **Performance:** Processing very large files may take some time, depending on your system.
 ---
+## Dependencies
+- `transformers`: For ASR model.
+- `torch`: Backend for model computations.
+- `pydub`: For audio splitting.
+- `ffmpeg`: Required by `pydub` for audio processing.
+- `gradio`: To create the web interface.
+Install them using:
+```bash
+pip install -r requirements.txt
+```
+---
+## License
+This project is licensed under the MIT License. See the LICENSE file for details.
+---
+## Acknowledgments
+- Hugging Face for providing pre-trained models.
+- Gradio for the simple interface framework.
+- Pydub and FFmpeg for audio processing tools.