Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -1,14 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Audio Transcription App
|
2 |
+
|
3 |
+
This application leverages Hugging Face's `facebook/wav2vec2-large-xlsr-53-spanish` model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription.
|
4 |
+
|
5 |
+
---
|
6 |
+
|
7 |
+
## Features
|
8 |
+
- **Automatic Speech Recognition (ASR):** Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions.
|
9 |
+
- **Supports Long Audios:** Automatically splits long audio files into smaller chunks for processing.
|
10 |
+
- **Web Interface:** Provides a user-friendly interface using Gradio.
|
11 |
+
- **Flexible Audio Upload:** Accepts common audio formats like WAV and MP3.
|
12 |
+
|
13 |
+
---
|
14 |
+
|
15 |
+
## Installation
|
16 |
+
|
17 |
+
### 1. Clone the Repository
|
18 |
+
```bash
|
19 |
+
git clone <repository_url>
|
20 |
+
cd <repository_folder>
|
21 |
+
```
|
22 |
+
|
23 |
+
### 2. Install Dependencies
|
24 |
+
Ensure you have Python 3.7 or higher installed. Then, run:
|
25 |
+
```bash
|
26 |
+
pip install -r requirements.txt
|
27 |
+
```
|
28 |
+
|
29 |
+
### 3. Install FFmpeg
|
30 |
+
This application uses `pydub`, which requires FFmpeg. Follow these steps to install it:
|
31 |
+
- **Windows:** Download FFmpeg from [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html). Add the `bin` directory to your PATH.
|
32 |
+
- **MacOS:** Use Homebrew:
|
33 |
+
```bash
|
34 |
+
brew install ffmpeg
|
35 |
+
```
|
36 |
+
- **Linux:** Install via your package manager, e.g.,
|
37 |
+
```bash
|
38 |
+
sudo apt install ffmpeg
|
39 |
+
```
|
40 |
+
|
41 |
+
---
|
42 |
+
|
43 |
+
## Usage
|
44 |
+
|
45 |
+
### 1. Run the Application
|
46 |
+
Start the app by running:
|
47 |
+
```bash
|
48 |
+
python app.py
|
49 |
+
```
|
50 |
+
|
51 |
+
### 2. Open the Web Interface
|
52 |
+
Once the app starts, it will provide a local URL (e.g., `http://127.0.0.1:7860/`). Open this URL in your web browser.
|
53 |
+
|
54 |
+
### 3. Upload an Audio File
|
55 |
+
- Click on the upload button to select an audio file.
|
56 |
+
- Supported formats: WAV, MP3, etc.
|
57 |
+
|
58 |
+
### 4. Get the Transcription
|
59 |
+
- Once the audio is processed, the transcription will appear in the text box.
|
60 |
+
- You can copy the transcription for further use.
|
61 |
+
|
62 |
---
|
63 |
+
|
64 |
+
## File Structure
|
65 |
+
- `app.py`: Main application script.
|
66 |
+
- `requirements.txt`: List of dependencies.
|
67 |
+
- `chunks/`: Temporary folder where audio chunks are stored during processing.
|
68 |
+
- `transcripcion.txt`: File where the full transcription is saved after processing.
|
69 |
+
|
70 |
+
---
|
71 |
+
|
72 |
+
## Customization
|
73 |
+
|
74 |
+
### Adjust Chunk Length
|
75 |
+
The default chunk length is set to 30 seconds. You can adjust this by modifying the `chunk_length_ms` parameter in the `app.py` file:
|
76 |
+
```python
|
77 |
+
chunk_length_ms = 30000 # Change to desired length in milliseconds
|
78 |
+
```
|
79 |
+
|
80 |
+
---
|
81 |
+
|
82 |
+
## Limitations
|
83 |
+
- **Language:** The model is optimized for Spanish audio. Performance may vary with other languages.
|
84 |
+
- **Audio Quality:** Poor-quality audio may result in less accurate transcriptions.
|
85 |
+
- **Performance:** Processing very large files may take some time, depending on your system.
|
86 |
+
|
87 |
---
|
88 |
|
89 |
+
## Dependencies
|
90 |
+
- `transformers`: For ASR model.
|
91 |
+
- `torch`: Backend for model computations.
|
92 |
+
- `pydub`: For audio splitting.
|
93 |
+
- `ffmpeg`: Required by `pydub` for audio processing.
|
94 |
+
- `gradio`: To create the web interface.
|
95 |
+
|
96 |
+
Install them using:
|
97 |
+
```bash
|
98 |
+
pip install -r requirements.txt
|
99 |
+
```
|
100 |
+
|
101 |
+
---
|
102 |
+
|
103 |
+
## License
|
104 |
+
This project is licensed under the MIT License. See the LICENSE file for details.
|
105 |
+
|
106 |
+
---
|
107 |
+
|
108 |
+
## Acknowledgments
|
109 |
+
- Hugging Face for providing pre-trained models.
|
110 |
+
- Gradio for the simple interface framework.
|
111 |
+
- Pydub and FFmpeg for audio processing tools.
|
112 |
+
|
113 |
+
|