Upload README.md
Browse files
README.md
CHANGED
@@ -25,10 +25,16 @@ pipeline_tag: text-to-speech
|
|
25 |
|
26 |
### Releases
|
27 |
|
28 |
-
| Model | Published | Training Data |
|
29 |
-
| ----- | --------- | ------------- |
|
30 |
-
|
|
31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
### Usage
|
34 |
|
@@ -105,8 +111,6 @@ Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2
|
|
105 |
|
106 |
### Training Details
|
107 |
|
108 |
-
**Compute:** About $1000 for 1000 hours of A100 80GB vRAM
|
109 |
-
|
110 |
**Data:** Kokoro was trained exclusively on **permissive/non-copyrighted audio data** and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:
|
111 |
- Public domain audio
|
112 |
- Audio licensed under Apache, MIT, etc
|
@@ -116,6 +120,8 @@ Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2
|
|
116 |
|
117 |
**Total Dataset Size:** A few hundred hours of audio
|
118 |
|
|
|
|
|
119 |
### Creative Commons Attribution
|
120 |
|
121 |
The following CC BY audio was part of the dataset used to train Kokoro v1.0.
|
|
|
25 |
|
26 |
### Releases
|
27 |
|
28 |
+
| Model | Published | Training Data | Langs & Voices | SHA256 |
|
29 |
+
| ----- | --------- | ------------- | -------------- | ------ |
|
30 |
+
| [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | 1 & 10 | `3b0c392f` |
|
31 |
+
| **v1.0** | **2025 Jan 27** | **Few hundred hrs** | [**8 & 54**](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) | `496dba11` |
|
32 |
+
|
33 |
+
| Training Costs | v0.19 | v1.0 | **Total** |
|
34 |
+
| -------------- | ----- | ---- | ----- |
|
35 |
+
| in A100 80GB GPU hours | 500 | 500 | **1000** |
|
36 |
+
| average hourly rate | $0.80/h | $1.20/h | **$1/h** |
|
37 |
+
| in USD | $400 | $600 | **$1000** |
|
38 |
|
39 |
### Usage
|
40 |
|
|
|
111 |
|
112 |
### Training Details
|
113 |
|
|
|
|
|
114 |
**Data:** Kokoro was trained exclusively on **permissive/non-copyrighted audio data** and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:
|
115 |
- Public domain audio
|
116 |
- Audio licensed under Apache, MIT, etc
|
|
|
120 |
|
121 |
**Total Dataset Size:** A few hundred hours of audio
|
122 |
|
123 |
+
**Total Training Cost:** About $1000 for 1000 hours of A100 80GB vRAM
|
124 |
+
|
125 |
### Creative Commons Attribution
|
126 |
|
127 |
The following CC BY audio was part of the dataset used to train Kokoro v1.0.
|