gradientai
/

v-alpha-tross

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chrispreemo commited on Jan 25

Commit

6188e34

•

1 Parent(s): cb97e31

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -62,13 +62,13 @@ Supervised fine-tuning (SFT) and direct preference optimization (DPO)[3] further
 | Category | # Tokens (1Ms) | % of Total |
 | --- | --- | --- |
 | Chat (e.g. [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)) | 640 | 45.2 |
-| Alignment (e.g. [orca_dpo](https://huggingface.co/datasets/Intel/orca_dpo_pairs)) | 331 | 23.4 |
-| Math (e.g. Goat[4]) | 300 | 21.2 |
 | Tabular * | 68 | 4.8 |
 | Summarization (e.g. [legal_summarization](https://huggingface.co/datasets/lighteval/legal_summarization)) | 52 | 3.7 |
 | Open-book (e.g. [selfrag](https://huggingface.co/datasets/selfrag/selfrag_train_data)) | 25 | 1.8 |
-(*) = Proprietary
 [3] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.

 | Category | # Tokens (1Ms) | % of Total |
 | --- | --- | --- |
 | Chat (e.g. [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)) | 640 | 45.2 |
+| Alignment * (e.g. [orca_dpo](https://huggingface.co/datasets/Intel/orca_dpo_pairs)) | 331 | 23.4 |
+| Math * (e.g. Goat[4]) | 300 | 21.2 |
 | Tabular * | 68 | 4.8 |
 | Summarization (e.g. [legal_summarization](https://huggingface.co/datasets/lighteval/legal_summarization)) | 52 | 3.7 |
 | Open-book (e.g. [selfrag](https://huggingface.co/datasets/selfrag/selfrag_train_data)) | 25 | 1.8 |
+(*) = Proprietary or includes proprietary data sets
 [3] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.