Blandskron commited on
Commit
0123ce7
·
verified ·
1 Parent(s): 4971271

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -11
README.md CHANGED
@@ -1,14 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Audio To Text
3
- emoji: 🌖
4
- colorFrom: blue
5
- colorTo: gray
6
- sdk: gradio
7
- sdk_version: 5.12.0
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: Educativo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Audio Transcription App
2
+
3
+ This application leverages Hugging Face's `facebook/wav2vec2-large-xlsr-53-spanish` model to transcribe audio files. It provides a simple web interface where users can upload audio recordings, such as meeting recordings, and receive a full transcription.
4
+
5
+ ---
6
+
7
+ ## Features
8
+ - **Automatic Speech Recognition (ASR):** Utilizes Hugging Face's pre-trained model for high-quality Spanish transcriptions.
9
+ - **Supports Long Audios:** Automatically splits long audio files into smaller chunks for processing.
10
+ - **Web Interface:** Provides a user-friendly interface using Gradio.
11
+ - **Flexible Audio Upload:** Accepts common audio formats like WAV and MP3.
12
+
13
+ ---
14
+
15
+ ## Installation
16
+
17
+ ### 1. Clone the Repository
18
+ ```bash
19
+ git clone <repository_url>
20
+ cd <repository_folder>
21
+ ```
22
+
23
+ ### 2. Install Dependencies
24
+ Ensure you have Python 3.7 or higher installed. Then, run:
25
+ ```bash
26
+ pip install -r requirements.txt
27
+ ```
28
+
29
+ ### 3. Install FFmpeg
30
+ This application uses `pydub`, which requires FFmpeg. Follow these steps to install it:
31
+ - **Windows:** Download FFmpeg from [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html). Add the `bin` directory to your PATH.
32
+ - **MacOS:** Use Homebrew:
33
+ ```bash
34
+ brew install ffmpeg
35
+ ```
36
+ - **Linux:** Install via your package manager, e.g.,
37
+ ```bash
38
+ sudo apt install ffmpeg
39
+ ```
40
+
41
+ ---
42
+
43
+ ## Usage
44
+
45
+ ### 1. Run the Application
46
+ Start the app by running:
47
+ ```bash
48
+ python app.py
49
+ ```
50
+
51
+ ### 2. Open the Web Interface
52
+ Once the app starts, it will provide a local URL (e.g., `http://127.0.0.1:7860/`). Open this URL in your web browser.
53
+
54
+ ### 3. Upload an Audio File
55
+ - Click on the upload button to select an audio file.
56
+ - Supported formats: WAV, MP3, etc.
57
+
58
+ ### 4. Get the Transcription
59
+ - Once the audio is processed, the transcription will appear in the text box.
60
+ - You can copy the transcription for further use.
61
+
62
  ---
63
+
64
+ ## File Structure
65
+ - `app.py`: Main application script.
66
+ - `requirements.txt`: List of dependencies.
67
+ - `chunks/`: Temporary folder where audio chunks are stored during processing.
68
+ - `transcripcion.txt`: File where the full transcription is saved after processing.
69
+
70
+ ---
71
+
72
+ ## Customization
73
+
74
+ ### Adjust Chunk Length
75
+ The default chunk length is set to 30 seconds. You can adjust this by modifying the `chunk_length_ms` parameter in the `app.py` file:
76
+ ```python
77
+ chunk_length_ms = 30000 # Change to desired length in milliseconds
78
+ ```
79
+
80
+ ---
81
+
82
+ ## Limitations
83
+ - **Language:** The model is optimized for Spanish audio. Performance may vary with other languages.
84
+ - **Audio Quality:** Poor-quality audio may result in less accurate transcriptions.
85
+ - **Performance:** Processing very large files may take some time, depending on your system.
86
+
87
  ---
88
 
89
+ ## Dependencies
90
+ - `transformers`: For ASR model.
91
+ - `torch`: Backend for model computations.
92
+ - `pydub`: For audio splitting.
93
+ - `ffmpeg`: Required by `pydub` for audio processing.
94
+ - `gradio`: To create the web interface.
95
+
96
+ Install them using:
97
+ ```bash
98
+ pip install -r requirements.txt
99
+ ```
100
+
101
+ ---
102
+
103
+ ## License
104
+ This project is licensed under the MIT License. See the LICENSE file for details.
105
+
106
+ ---
107
+
108
+ ## Acknowledgments
109
+ - Hugging Face for providing pre-trained models.
110
+ - Gradio for the simple interface framework.
111
+ - Pydub and FFmpeg for audio processing tools.
112
+
113
+