Update README.md
Browse files
README.md
CHANGED
@@ -59,13 +59,14 @@ set a seed for reproducibility:
|
|
59 |
|
60 |
## Dataset
|
61 |
|
62 |
-
The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank Disco Research and Björn Plüster for making their dataset available to us.
|
|
|
63 |
|
64 |
**English and Code**
|
65 |
- [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
|
66 |
|
67 |
**German**
|
68 |
-
- [DiscoLM German Dataset](https://huggingface.co/DiscoResearch)
|
69 |
- [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
|
70 |
- [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)
|
71 |
|
|
|
59 |
|
60 |
## Dataset
|
61 |
|
62 |
+
The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank [Disco Research](https://huggingface.co/DiscoResearch), [Jan Philipp Harries](https://huggingface.co/jphme), and [Björn Plüster](https://huggingface.co/bjoernp) for making their dataset available to us.
|
63 |
+
|
64 |
|
65 |
**English and Code**
|
66 |
- [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
|
67 |
|
68 |
**German**
|
69 |
+
- [DiscoLM German Dataset](https://huggingface.co/DiscoResearch) includes the publicly available [germanrag](https://huggingface.co/datasets/DiscoResearch/germanrag) dataset
|
70 |
- [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
|
71 |
- [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)
|
72 |
|