Benjamin-png
commited on
Commit
•
15f0ff3
1
Parent(s):
bb5aebf
Update README.md
Browse files
README.md
CHANGED
@@ -21,19 +21,8 @@ You can check out the code and process used in the fine-tuning by visiting the [
|
|
21 |
|
22 |
You can load and use the model directly from the Hugging Face model hub using either the `pipeline` API or by manually downloading the model and tokenizer.
|
23 |
|
24 |
-
### 1. Using the `pipeline` API
|
25 |
|
26 |
-
|
27 |
-
from transformers import pipeline
|
28 |
-
|
29 |
-
# Load the fine-tuned model
|
30 |
-
tts = pipeline("text-to-speech", model="Benjamin-png/swahili-mms-tts-finetuned")
|
31 |
-
|
32 |
-
# Generate speech from text
|
33 |
-
speech = tts("Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili.")
|
34 |
-
```
|
35 |
-
|
36 |
-
### 2. Download and Run the Model Directly
|
37 |
|
38 |
You can also download the model and tokenizer manually and run the text-to-speech pipeline without the Hugging Face `pipeline` helper. Here's how:
|
39 |
|
@@ -41,8 +30,8 @@ You can also download the model and tokenizer manually and run the text-to-speec
|
|
41 |
import torch
|
42 |
import numpy as np
|
43 |
import scipy.io.wavfile
|
44 |
-
from transformers import AutoTokenizer
|
45 |
-
|
46 |
|
47 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
48 |
model_name = "Benjamin-png/swahili-mms-tts-finetuned"
|
@@ -67,6 +56,21 @@ output_np = output.squeeze().cpu().numpy()
|
|
67 |
scipy.io.wavfile.write(audio_file_path, rate=model.config.sampling_rate, data=output_np)
|
68 |
```
|
69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
### Saving and Playing the Audio
|
71 |
|
72 |
To save and play the audio, you can use the same methods mentioned above:
|
@@ -103,7 +107,7 @@ pip install torch transformers numpy soundfile scipy pydub
|
|
103 |
|
104 |
If you're interested in reproducing the fine-tuning process or using the model for similar purposes, you can check out the Google Colab notebook that outlines the entire process:
|
105 |
|
106 |
-
- [Google Colab Notebook](
|
107 |
|
108 |
The notebook includes detailed steps on how to fine-tune the MMS model for Swahili TTS.
|
109 |
|
|
|
21 |
|
22 |
You can load and use the model directly from the Hugging Face model hub using either the `pipeline` API or by manually downloading the model and tokenizer.
|
23 |
|
|
|
24 |
|
25 |
+
### 1. Download and Run the Model Directly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
You can also download the model and tokenizer manually and run the text-to-speech pipeline without the Hugging Face `pipeline` helper. Here's how:
|
28 |
|
|
|
30 |
import torch
|
31 |
import numpy as np
|
32 |
import scipy.io.wavfile
|
33 |
+
from transformers import VitsModel, AutoTokenizer
|
34 |
+
|
35 |
|
36 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
37 |
model_name = "Benjamin-png/swahili-mms-tts-finetuned"
|
|
|
56 |
scipy.io.wavfile.write(audio_file_path, rate=model.config.sampling_rate, data=output_np)
|
57 |
```
|
58 |
|
59 |
+
|
60 |
+
### 2. Using the `pipeline` API
|
61 |
+
|
62 |
+
```python
|
63 |
+
from transformers import pipeline
|
64 |
+
|
65 |
+
# Load the fine-tuned model
|
66 |
+
tts = pipeline("text-to-speech", model="Benjamin-png/swahili-mms-tts-finetuned")
|
67 |
+
|
68 |
+
# Generate speech from text
|
69 |
+
speech = tts("Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili.")
|
70 |
+
```
|
71 |
+
|
72 |
+
|
73 |
+
|
74 |
### Saving and Playing the Audio
|
75 |
|
76 |
To save and play the audio, you can use the same methods mentioned above:
|
|
|
107 |
|
108 |
If you're interested in reproducing the fine-tuning process or using the model for similar purposes, you can check out the Google Colab notebook that outlines the entire process:
|
109 |
|
110 |
+
- [Google Colab Notebook](https://colab.research.google.com/drive/1dK1a814UqDnXnM5Rz6NBmk-vmhdN9M4f#scrollTo=iG6IrVva27uT)
|
111 |
|
112 |
The notebook includes detailed steps on how to fine-tune the MMS model for Swahili TTS.
|
113 |
|