Update README.md
Browse files
README.md
CHANGED
@@ -189,8 +189,6 @@ Using this template, each turn is preceded by a `<|im_start|>` delimiter and the
|
|
189 |
|
190 |
## Data
|
191 |
|
192 |
-
## Data
|
193 |
-
|
194 |
### Pretraining Data
|
195 |
|
196 |
The training corpus consists of 2.4 trillion tokens, including 35 European languages and 92 programming languages. It amounts to a total of 33TB of pre-processed text.
|
@@ -591,8 +589,6 @@ The dataset does not allow for external contributions.
|
|
591 |
|
592 |
</details>
|
593 |
|
594 |
-
---
|
595 |
-
|
596 |
### Finetuning Data
|
597 |
|
598 |
This instruction-tuned variant has been trained with a mixture of 276k English, Spanish, and Catalan multi-turn instructions gathered from open datasets:
|
|
|
189 |
|
190 |
## Data
|
191 |
|
|
|
|
|
192 |
### Pretraining Data
|
193 |
|
194 |
The training corpus consists of 2.4 trillion tokens, including 35 European languages and 92 programming languages. It amounts to a total of 33TB of pre-processed text.
|
|
|
589 |
|
590 |
</details>
|
591 |
|
|
|
|
|
592 |
### Finetuning Data
|
593 |
|
594 |
This instruction-tuned variant has been trained with a mixture of 276k English, Spanish, and Catalan multi-turn instructions gathered from open datasets:
|