Upload 2 files
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ pipeline_tag: text-to-speech
|
|
24 |
|
25 |
| Model | Published | Training Data | Compute (A100 80GB) | Langs & Voices | SHA256 |
|
26 |
| ----- | --------- | ------------- | ------------------- | -------------- | ------ |
|
27 |
-
| **v1.0** | **2025 Jan 27** | **Few hundred hrs** | **$1000 for 1000 hrs** | [**6 &
|
28 |
| [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | $400 for 500 hrs | 1 & 10 | `3b0c392f` |
|
29 |
|
30 |
### Usage
|
|
|
24 |
|
25 |
| Model | Published | Training Data | Compute (A100 80GB) | Langs & Voices | SHA256 |
|
26 |
| ----- | --------- | ------------- | ------------------- | -------------- | ------ |
|
27 |
+
| **v1.0** | **2025 Jan 27** | **Few hundred hrs** | **$1000 for 1000 hrs** | [**6 & 47**](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) | `496dba11` |
|
28 |
| [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | $400 for 500 hrs | 1 & 10 | `3b0c392f` |
|
29 |
|
30 |
### Usage
|
VOICES.md
CHANGED
@@ -10,14 +10,15 @@ Subjectively, voices will sound better or worse to different people.
|
|
10 |
|
11 |
**Training Duration**
|
12 |
- How much audio was seen during training? Smaller durations result in a lower overall grade.
|
13 |
-
- 10 hours <= HH hours < 100 hours
|
14 |
- 1 hour <= H hours < 10 hours
|
15 |
- 10 minutes <= MM minutes < 100 minutes
|
16 |
-
- 1 minute <=
|
17 |
|
18 |
### American English ๐บ๐ธ
|
19 |
|
20 |
-
- [`misaki[en]`](https://github.com/hexgrad/misaki)
|
|
|
21 |
|
22 |
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
|
23 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
|
@@ -30,7 +31,7 @@ Subjectively, voices will sound better or worse to different people.
|
|
30 |
| af_nova | ๐บ | B | MM minutes | C | `e0233676` |
|
31 |
| af_river | ๐บ | C | MM minutes | D | `e149459b` |
|
32 |
| af_sarah | ๐บ | B | H hours | C+ | `49bd364e` |
|
33 |
-
| af_sky |
|
34 |
| am_adam | ๐น | D | H hours | F+ | `ced7e284` |
|
35 |
| am_echo | ๐น | C | MM minutes | D | `8bcfdc85` |
|
36 |
| am_eric | ๐น | C | MM minutes | D | `ada66f0e` |
|
@@ -39,10 +40,12 @@ Subjectively, voices will sound better or worse to different people.
|
|
39 |
| am_michael | ๐น | B | H hours | C+ | `9a443b79` |
|
40 |
| am_onyx | ๐น | C | MM minutes | D | `e8452be1` |
|
41 |
| am_puck | ๐น | B | H hours | C+ | `dd1d8973` |
|
|
|
42 |
|
43 |
### British English ๐ฌ๐ง
|
44 |
|
45 |
-
- [`misaki[en]`](https://github.com/hexgrad/misaki)
|
|
|
46 |
|
47 |
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
|
48 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
|
@@ -57,6 +60,7 @@ Subjectively, voices will sound better or worse to different people.
|
|
57 |
|
58 |
### French ๐ซ๐ท
|
59 |
|
|
|
60 |
- espeak-ng `fr-fr`
|
61 |
- Total French training data: <11 hours
|
62 |
|
@@ -66,6 +70,7 @@ Subjectively, voices will sound better or worse to different people.
|
|
66 |
|
67 |
### Hindi ๐ฎ๐ณ
|
68 |
|
|
|
69 |
- espeak-ng `hi`
|
70 |
- Total Hindi training data: H hours
|
71 |
|
@@ -78,6 +83,7 @@ Subjectively, voices will sound better or worse to different people.
|
|
78 |
|
79 |
### Italian ๐ฎ๐ณ
|
80 |
|
|
|
81 |
- espeak-ng `it`
|
82 |
- Total Italian training data: H hours
|
83 |
|
@@ -88,20 +94,20 @@ Subjectively, voices will sound better or worse to different people.
|
|
88 |
|
89 |
### Japanese ๐ฏ๐ต
|
90 |
|
91 |
-
- [`misaki[ja]`](https://github.com/hexgrad/misaki)
|
92 |
- Total Japanese training data: H hours
|
93 |
|
94 |
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 | CC BY |
|
95 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ | ----- |
|
96 |
| jf_alpha | ๐บ | B | H hours | C+ | `1bf4c9dc` | |
|
97 |
| jf_gongitsune | ๐บ | B | MM minutes | C | `1b171917` | [gongitsune](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__gongitsune.txt) |
|
98 |
-
| jf_nezumi |
|
99 |
| jf_tebukuro | ๐บ | B | MM minutes | C | `0d691790` | [tebukurowokaini](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__tebukurowokaini.txt) |
|
100 |
-
| jm_kumo |
|
101 |
|
102 |
### Mandarin Chinese ๐จ๐ณ
|
103 |
|
104 |
-
- [`misaki[zh]`](https://github.com/hexgrad/misaki)
|
105 |
- Total Mandarin Chinese training data: H hours
|
106 |
|
107 |
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
|
|
|
10 |
|
11 |
**Training Duration**
|
12 |
- How much audio was seen during training? Smaller durations result in a lower overall grade.
|
13 |
+
- 10 hours <= **HH hours** < 100 hours
|
14 |
- 1 hour <= H hours < 10 hours
|
15 |
- 10 minutes <= MM minutes < 100 minutes
|
16 |
+
- 1 minute <= _M minutes_ < 10 minutes ๐ค
|
17 |
|
18 |
### American English ๐บ๐ธ
|
19 |
|
20 |
+
- `lang_code='a'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
|
21 |
+
- espeak-ng `en-us` fallback
|
22 |
|
23 |
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
|
24 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
|
|
|
31 |
| af_nova | ๐บ | B | MM minutes | C | `e0233676` |
|
32 |
| af_river | ๐บ | C | MM minutes | D | `e149459b` |
|
33 |
| af_sarah | ๐บ | B | H hours | C+ | `49bd364e` |
|
34 |
+
| af_sky | ๐บ๐ค | B | _M minutes_ | C- | `c799548a` |
|
35 |
| am_adam | ๐น | D | H hours | F+ | `ced7e284` |
|
36 |
| am_echo | ๐น | C | MM minutes | D | `8bcfdc85` |
|
37 |
| am_eric | ๐น | C | MM minutes | D | `ada66f0e` |
|
|
|
40 |
| am_michael | ๐น | B | H hours | C+ | `9a443b79` |
|
41 |
| am_onyx | ๐น | C | MM minutes | D | `e8452be1` |
|
42 |
| am_puck | ๐น | B | H hours | C+ | `dd1d8973` |
|
43 |
+
| am_santa | ๐น๐ค | C | _M minutes_ | D- | `7f2f7582` |
|
44 |
|
45 |
### British English ๐ฌ๐ง
|
46 |
|
47 |
+
- `lang_code='b'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
|
48 |
+
- espeak-ng `en-gb` fallback
|
49 |
|
50 |
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
|
51 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
|
|
|
60 |
|
61 |
### French ๐ซ๐ท
|
62 |
|
63 |
+
- `lang_code='f'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
|
64 |
- espeak-ng `fr-fr`
|
65 |
- Total French training data: <11 hours
|
66 |
|
|
|
70 |
|
71 |
### Hindi ๐ฎ๐ณ
|
72 |
|
73 |
+
- `lang_code='h'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
|
74 |
- espeak-ng `hi`
|
75 |
- Total Hindi training data: H hours
|
76 |
|
|
|
83 |
|
84 |
### Italian ๐ฎ๐ณ
|
85 |
|
86 |
+
- `lang_code='i'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
|
87 |
- espeak-ng `it`
|
88 |
- Total Italian training data: H hours
|
89 |
|
|
|
94 |
|
95 |
### Japanese ๐ฏ๐ต
|
96 |
|
97 |
+
- `lang_code='j'` in [`misaki[ja]`](https://github.com/hexgrad/misaki)
|
98 |
- Total Japanese training data: H hours
|
99 |
|
100 |
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 | CC BY |
|
101 |
| ---- | ------ | -------------- | ----------------- | ------------- | ------ | ----- |
|
102 |
| jf_alpha | ๐บ | B | H hours | C+ | `1bf4c9dc` | |
|
103 |
| jf_gongitsune | ๐บ | B | MM minutes | C | `1b171917` | [gongitsune](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__gongitsune.txt) |
|
104 |
+
| jf_nezumi | ๐บ๐ค | B | _M minutes_ | C- | `d83f007a` | [nezuminoyomeiri](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__nezuminoyomeiri.txt) |
|
105 |
| jf_tebukuro | ๐บ | B | MM minutes | C | `0d691790` | [tebukurowokaini](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__tebukurowokaini.txt) |
|
106 |
+
| jm_kumo | ๐น๐ค | B | _M minutes_ | C- | `98340afd` | [kumonoito](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__kumonoito.txt) |
|
107 |
|
108 |
### Mandarin Chinese ๐จ๐ณ
|
109 |
|
110 |
+
- `lang_code='z'` in [`misaki[zh]`](https://github.com/hexgrad/misaki)
|
111 |
- Total Mandarin Chinese training data: H hours
|
112 |
|
113 |
| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
|