BramVanroy commited on
Commit
6f8ff6c
·
verified ·
1 Parent(s): 44137f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -44,13 +44,14 @@ Training data consists of older datasets that were translated to Dutch with Open
44
 
45
  The training set (`train_sft`) consists of 240,527,565 tokens (calculated prior to applying a chat template). The test sets (`test_sft` in the datasets) account for 26,397,086 tokens, which is around 10.97\% of the training set.
46
 
47
- Here is a break down of the training set:
 
 
 
 
 
 
48
 
49
- BramVanroy/ultrachat_200k_dutch (gpt-4-turbo): 85.42%
50
- BramVanroy/stackoverflow-chat-dutch (code; gpt-3.5-turbo): 8.38%
51
- BramVanroy/alpaca-cleaned-dutch (gpt-3.5-turbo): 2.62%
52
- BramVanroy/dolly-15k-dutch (gpt-3.5-turbo): 1.39%
53
- BramVanroy/no_robots_dutch (gpt-4-turbo): 2.20%
54
 
55
 
56
  ## Training procedure
 
44
 
45
  The training set (`train_sft`) consists of 240,527,565 tokens (calculated prior to applying a chat template). The test sets (`test_sft` in the datasets) account for 26,397,086 tokens, which is around 10.97\% of the training set.
46
 
47
+ Here is a break down of the training set (some data pages might not be available yet *but they definitely will be in the near future*).
48
+
49
+ - [BramVanroy/ultrachat_200k_dutch](https://huggingface.co/datasets/BramVanroy/ultrachat_200k_dutch) (gpt-4-turbo; multi-turn; generated): 85.42%
50
+ - [BramVanroy/no_robots_dutch](https://huggingface.co/datasets/BramVanroy/no_robots_dutch) (gpt-4-turbo; prompt translate, answer generated): 2.20%
51
+ - [BramVanroy/stackoverflow-chat-dutch](https://huggingface.co/datasets/BramVanroy/stackoverflow-chat-dutch) (gpt-3.5-turbo; multi-turn; code; translated): 8.38%
52
+ - [BramVanroy/alpaca-cleaned-dutch](https://huggingface.co/datasets/BramVanroy/alpaca-cleaned-dutch) (gpt-3.5-turbo; translated): 2.62%
53
+ - [BramVanroy/dolly-15k-dutch](https://huggingface.co/datasets/BramVanroy/dolly-15k-dutch) (gpt-3.5-turbo; translated): 1.39%
54
 
 
 
 
 
 
55
 
56
 
57
  ## Training procedure