Spaces:

Blandskron
/

Audio-to-Text

Sleeping

App Files Files Community

Audio-to-Text / README.md

Blandskron

Update README.md

0123ce7 verified about 2 months ago

preview code

raw

history blame

3.24 kB

	# Audio Transcription App

	This application leverages Hugging Face's `facebook/wav2vec2-large-xlsr-53-spanish` model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription.

	---

	## Features
	- Automatic Speech Recognition (ASR): Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions.
	- Supports Long Audios: Automatically splits long audio files into smaller chunks for processing.
	- Web Interface: Provides a user-friendly interface using Gradio.
	- Flexible Audio Upload: Accepts common audio formats like WAV and MP3.

	---

	## Installation

	### 1. Clone the Repository
	```bash
	git clone <repository_url>
	cd <repository_folder>
	```

	### 2. Install Dependencies
	Ensure you have Python 3.7 or higher installed. Then, run:
	```bash
	pip install -r requirements.txt
	```

	### 3. Install FFmpeg
	This application uses `pydub`, which requires FFmpeg. Follow these steps to install it:
	- Windows: Download FFmpeg from [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html). Add the `bin` directory to your PATH.
	- MacOS: Use Homebrew:
	```bash
	brew install ffmpeg
	```
	- Linux: Install via your package manager, e.g.,
	```bash
	sudo apt install ffmpeg
	```

	---

	## Usage

	### 1. Run the Application
	Start the app by running:
	```bash
	python app.py
	```

	### 2. Open the Web Interface
	Once the app starts, it will provide a local URL (e.g., `http://127.0.0.1:7860/`). Open this URL in your web browser.

	### 3. Upload an Audio File
	- Click on the upload button to select an audio file.
	- Supported formats: WAV, MP3, etc.

	### 4. Get the Transcription
	- Once the audio is processed, the transcription will appear in the text box.
	- You can copy the transcription for further use.

	---

	## File Structure
	- `app.py`: Main application script.
	- `requirements.txt`: List of dependencies.
	- `chunks/`: Temporary folder where audio chunks are stored during processing.
	- `transcripcion.txt`: File where the full transcription is saved after processing.

	---

	## Customization

	### Adjust Chunk Length
	The default chunk length is set to 30 seconds. You can adjust this by modifying the `chunk_length_ms` parameter in the `app.py` file:
	```python
	chunk_length_ms = 30000 # Change to desired length in milliseconds
	```

	---

	## Limitations
	- Language: The model is optimized for Spanish audio. Performance may vary with other languages.
	- Audio Quality: Poor-quality audio may result in less accurate transcriptions.
	- Performance: Processing very large files may take some time, depending on your system.

	---

	## Dependencies
	- `transformers`: For ASR model.
	- `torch`: Backend for model computations.
	- `pydub`: For audio splitting.
	- `ffmpeg`: Required by `pydub` for audio processing.
	- `gradio`: To create the web interface.

	Install them using:
	```bash
	pip install -r requirements.txt
	```

	---

	## License
	This project is licensed under the MIT License. See the LICENSE file for details.

	---

	## Acknowledgments
	- Hugging Face for providing pre-trained models.
	- Gradio for the simple interface framework.
	- Pydub and FFmpeg for audio processing tools.