Benjamin-png commited on
Commit
bb5aebf
1 Parent(s): fec23e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -5
README.md CHANGED
@@ -1,6 +1,3 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
 
5
 
6
  # Swahili MMS TTS - Finetuned Model
@@ -22,7 +19,9 @@ You can check out the code and process used in the fine-tuning by visiting the [
22
 
23
  ## How to Use
24
 
25
- You can load and use the model directly from the Hugging Face model hub:
 
 
26
 
27
  ```python
28
  from transformers import pipeline
@@ -34,6 +33,72 @@ tts = pipeline("text-to-speech", model="Benjamin-png/swahili-mms-tts-finetuned")
34
  speech = tts("Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili.")
35
  ```
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ## Example Notebook
38
 
39
  If you're interested in reproducing the fine-tuning process or using the model for similar purposes, you can check out the Google Colab notebook that outlines the entire process:
@@ -48,5 +113,5 @@ For further exploration and code snippets, visit the [GitHub repository](https:/
48
 
49
  ## License
50
 
51
- This project is licensed under the terms of the apache License.
52
 
 
 
 
 
1
 
2
 
3
  # Swahili MMS TTS - Finetuned Model
 
19
 
20
  ## How to Use
21
 
22
+ You can load and use the model directly from the Hugging Face model hub using either the `pipeline` API or by manually downloading the model and tokenizer.
23
+
24
+ ### 1. Using the `pipeline` API
25
 
26
  ```python
27
  from transformers import pipeline
 
33
  speech = tts("Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili.")
34
  ```
35
 
36
+ ### 2. Download and Run the Model Directly
37
+
38
+ You can also download the model and tokenizer manually and run the text-to-speech pipeline without the Hugging Face `pipeline` helper. Here's how:
39
+
40
+ ```python
41
+ import torch
42
+ import numpy as np
43
+ import scipy.io.wavfile
44
+ from transformers import AutoTokenizer
45
+ from vits_model import VitsModel # Assuming VitsModel is the class for this TTS model
46
+
47
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
48
+ model_name = "Benjamin-png/swahili-mms-tts-finetuned"
49
+ text = "Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili."
50
+ audio_file_path = "swahili_speech.wav"
51
+
52
+ # Load model and tokenizer dynamically based on the provided model name
53
+ model = VitsModel.from_pretrained(model_name).to(device)
54
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
55
+
56
+ # Step 1: Tokenize the input text
57
+ inputs = tokenizer(text, return_tensors="pt").to(device)
58
+
59
+ # Step 2: Generate waveform
60
+ with torch.no_grad():
61
+ output = model(**inputs).waveform
62
+
63
+ # Step 3: Convert PyTorch tensor to NumPy array
64
+ output_np = output.squeeze().cpu().numpy()
65
+
66
+ # Step 4: Write to WAV file
67
+ scipy.io.wavfile.write(audio_file_path, rate=model.config.sampling_rate, data=output_np)
68
+ ```
69
+
70
+ ### Saving and Playing the Audio
71
+
72
+ To save and play the audio, you can use the same methods mentioned above:
73
+
74
+ #### Saving the Audio
75
+
76
+ ```python
77
+ import soundfile as sf
78
+
79
+ # Save the audio as a WAV file
80
+ sf.write("swahili_speech.wav", output_np, model.config.sampling_rate)
81
+ ```
82
+
83
+ #### Playing the Audio
84
+
85
+ You can play the audio using `pydub`:
86
+
87
+ ```python
88
+ from pydub import AudioSegment
89
+ from pydub.playback import play
90
+
91
+ # Load and play the generated audio
92
+ audio = AudioSegment.from_wav("swahili_speech.wav")
93
+ play(audio)
94
+ ```
95
+
96
+ Make sure to install the required libraries:
97
+
98
+ ```bash
99
+ pip install torch transformers numpy soundfile scipy pydub
100
+ ```
101
+
102
  ## Example Notebook
103
 
104
  If you're interested in reproducing the fine-tuning process or using the model for similar purposes, you can check out the Google Colab notebook that outlines the entire process:
 
113
 
114
  ## License
115
 
116
+ This project is licensed under the terms of the Apache License 2.0.
117