BramVanroy
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -44,13 +44,14 @@ Training data consists of older datasets that were translated to Dutch with Open
|
|
44 |
|
45 |
The training set (`train_sft`) consists of 240,527,565 tokens (calculated prior to applying a chat template). The test sets (`test_sft` in the datasets) account for 26,397,086 tokens, which is around 10.97\% of the training set.
|
46 |
|
47 |
-
Here is a break down of the training set
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
-
BramVanroy/ultrachat_200k_dutch (gpt-4-turbo): 85.42%
|
50 |
-
BramVanroy/stackoverflow-chat-dutch (code; gpt-3.5-turbo): 8.38%
|
51 |
-
BramVanroy/alpaca-cleaned-dutch (gpt-3.5-turbo): 2.62%
|
52 |
-
BramVanroy/dolly-15k-dutch (gpt-3.5-turbo): 1.39%
|
53 |
-
BramVanroy/no_robots_dutch (gpt-4-turbo): 2.20%
|
54 |
|
55 |
|
56 |
## Training procedure
|
|
|
44 |
|
45 |
The training set (`train_sft`) consists of 240,527,565 tokens (calculated prior to applying a chat template). The test sets (`test_sft` in the datasets) account for 26,397,086 tokens, which is around 10.97\% of the training set.
|
46 |
|
47 |
+
Here is a break down of the training set (some data pages might not be available yet *but they definitely will be in the near future*).
|
48 |
+
|
49 |
+
- [BramVanroy/ultrachat_200k_dutch](https://huggingface.co/datasets/BramVanroy/ultrachat_200k_dutch) (gpt-4-turbo; multi-turn; generated): 85.42%
|
50 |
+
- [BramVanroy/no_robots_dutch](https://huggingface.co/datasets/BramVanroy/no_robots_dutch) (gpt-4-turbo; prompt translate, answer generated): 2.20%
|
51 |
+
- [BramVanroy/stackoverflow-chat-dutch](https://huggingface.co/datasets/BramVanroy/stackoverflow-chat-dutch) (gpt-3.5-turbo; multi-turn; code; translated): 8.38%
|
52 |
+
- [BramVanroy/alpaca-cleaned-dutch](https://huggingface.co/datasets/BramVanroy/alpaca-cleaned-dutch) (gpt-3.5-turbo; translated): 2.62%
|
53 |
+
- [BramVanroy/dolly-15k-dutch](https://huggingface.co/datasets/BramVanroy/dolly-15k-dutch) (gpt-3.5-turbo; translated): 1.39%
|
54 |
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
|
57 |
## Training procedure
|