Benjamin-png
commited on
Commit
•
bb5aebf
1
Parent(s):
fec23e8
Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,3 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
4 |
|
5 |
|
6 |
# Swahili MMS TTS - Finetuned Model
|
@@ -22,7 +19,9 @@ You can check out the code and process used in the fine-tuning by visiting the [
|
|
22 |
|
23 |
## How to Use
|
24 |
|
25 |
-
You can load and use the model directly from the Hugging Face model hub
|
|
|
|
|
26 |
|
27 |
```python
|
28 |
from transformers import pipeline
|
@@ -34,6 +33,72 @@ tts = pipeline("text-to-speech", model="Benjamin-png/swahili-mms-tts-finetuned")
|
|
34 |
speech = tts("Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili.")
|
35 |
```
|
36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
## Example Notebook
|
38 |
|
39 |
If you're interested in reproducing the fine-tuning process or using the model for similar purposes, you can check out the Google Colab notebook that outlines the entire process:
|
@@ -48,5 +113,5 @@ For further exploration and code snippets, visit the [GitHub repository](https:/
|
|
48 |
|
49 |
## License
|
50 |
|
51 |
-
This project is licensed under the terms of the
|
52 |
|
|
|
|
|
|
|
|
|
1 |
|
2 |
|
3 |
# Swahili MMS TTS - Finetuned Model
|
|
|
19 |
|
20 |
## How to Use
|
21 |
|
22 |
+
You can load and use the model directly from the Hugging Face model hub using either the `pipeline` API or by manually downloading the model and tokenizer.
|
23 |
+
|
24 |
+
### 1. Using the `pipeline` API
|
25 |
|
26 |
```python
|
27 |
from transformers import pipeline
|
|
|
33 |
speech = tts("Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili.")
|
34 |
```
|
35 |
|
36 |
+
### 2. Download and Run the Model Directly
|
37 |
+
|
38 |
+
You can also download the model and tokenizer manually and run the text-to-speech pipeline without the Hugging Face `pipeline` helper. Here's how:
|
39 |
+
|
40 |
+
```python
|
41 |
+
import torch
|
42 |
+
import numpy as np
|
43 |
+
import scipy.io.wavfile
|
44 |
+
from transformers import AutoTokenizer
|
45 |
+
from vits_model import VitsModel # Assuming VitsModel is the class for this TTS model
|
46 |
+
|
47 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
48 |
+
model_name = "Benjamin-png/swahili-mms-tts-finetuned"
|
49 |
+
text = "Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili."
|
50 |
+
audio_file_path = "swahili_speech.wav"
|
51 |
+
|
52 |
+
# Load model and tokenizer dynamically based on the provided model name
|
53 |
+
model = VitsModel.from_pretrained(model_name).to(device)
|
54 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
55 |
+
|
56 |
+
# Step 1: Tokenize the input text
|
57 |
+
inputs = tokenizer(text, return_tensors="pt").to(device)
|
58 |
+
|
59 |
+
# Step 2: Generate waveform
|
60 |
+
with torch.no_grad():
|
61 |
+
output = model(**inputs).waveform
|
62 |
+
|
63 |
+
# Step 3: Convert PyTorch tensor to NumPy array
|
64 |
+
output_np = output.squeeze().cpu().numpy()
|
65 |
+
|
66 |
+
# Step 4: Write to WAV file
|
67 |
+
scipy.io.wavfile.write(audio_file_path, rate=model.config.sampling_rate, data=output_np)
|
68 |
+
```
|
69 |
+
|
70 |
+
### Saving and Playing the Audio
|
71 |
+
|
72 |
+
To save and play the audio, you can use the same methods mentioned above:
|
73 |
+
|
74 |
+
#### Saving the Audio
|
75 |
+
|
76 |
+
```python
|
77 |
+
import soundfile as sf
|
78 |
+
|
79 |
+
# Save the audio as a WAV file
|
80 |
+
sf.write("swahili_speech.wav", output_np, model.config.sampling_rate)
|
81 |
+
```
|
82 |
+
|
83 |
+
#### Playing the Audio
|
84 |
+
|
85 |
+
You can play the audio using `pydub`:
|
86 |
+
|
87 |
+
```python
|
88 |
+
from pydub import AudioSegment
|
89 |
+
from pydub.playback import play
|
90 |
+
|
91 |
+
# Load and play the generated audio
|
92 |
+
audio = AudioSegment.from_wav("swahili_speech.wav")
|
93 |
+
play(audio)
|
94 |
+
```
|
95 |
+
|
96 |
+
Make sure to install the required libraries:
|
97 |
+
|
98 |
+
```bash
|
99 |
+
pip install torch transformers numpy soundfile scipy pydub
|
100 |
+
```
|
101 |
+
|
102 |
## Example Notebook
|
103 |
|
104 |
If you're interested in reproducing the fine-tuning process or using the model for similar purposes, you can check out the Google Colab notebook that outlines the entire process:
|
|
|
113 |
|
114 |
## License
|
115 |
|
116 |
+
This project is licensed under the terms of the Apache License 2.0.
|
117 |
|