Restore model card
Browse files
README.md
CHANGED
@@ -6,30 +6,31 @@ tags:
|
|
6 |
- generated_from_trainer
|
7 |
- trl
|
8 |
- sft
|
9 |
-
|
|
|
|
|
10 |
---
|
11 |
|
12 |
-
#
|
13 |
|
14 |
-
This model
|
15 |
-
|
16 |
-
|
17 |
-
## Quick start
|
18 |
-
|
19 |
-
```python
|
20 |
-
from transformers import pipeline
|
21 |
-
|
22 |
-
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
23 |
-
generator = pipeline("text-generation", model="Alepach/notHumpback-M0", device="cuda")
|
24 |
-
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
25 |
-
print(output["generated_text"])
|
26 |
-
```
|
27 |
|
28 |
-
|
|
|
|
|
29 |
|
|
|
|
|
|
|
|
|
30 |
|
|
|
|
|
31 |
|
32 |
-
|
|
|
|
|
33 |
|
34 |
### Framework versions
|
35 |
|
@@ -41,7 +42,18 @@ This model was trained with SFT.
|
|
41 |
|
42 |
## Citations
|
43 |
|
|
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
Cite TRL as:
|
47 |
|
|
|
6 |
- generated_from_trainer
|
7 |
- trl
|
8 |
- sft
|
9 |
+
license: apache-2.0
|
10 |
+
datasets:
|
11 |
+
- OpenAssistant/oasst1
|
12 |
---
|
13 |
|
14 |
+
# notHumpback-M0
|
15 |
|
16 |
+
This model follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
|
17 |
+
by Li et al.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
+
It represents the "seed model", which is trained on a small amount of gold data and then
|
20 |
+
used to score the instruction-response pairs
|
21 |
+
generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx).
|
22 |
|
23 |
+
Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
|
24 |
+
creating a richer dataset for fine-tuning models without the need for additional manual annotation.
|
25 |
+
The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
|
26 |
+
of all pairs with the highest possible score (self-curation).
|
27 |
|
28 |
+
Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
|
29 |
+
It has been trained using [TRL](https://github.com/huggingface/trl).
|
30 |
|
31 |
+
The dataset used to train this model has been sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
|
32 |
+
To enable the model to judge and score the generated pairs, the model undergoes basic instruction-tuning on the input-output
|
33 |
+
pairs contained in the dataset.
|
34 |
|
35 |
### Framework versions
|
36 |
|
|
|
42 |
|
43 |
## Citations
|
44 |
|
45 |
+
Original paper:
|
46 |
|
47 |
+
```bibtex
|
48 |
+
@misc{li2023selfalignment,
|
49 |
+
title={Self-Alignment with Instruction Backtranslation},
|
50 |
+
author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
|
51 |
+
year={2023},
|
52 |
+
eprint={2308.06259},
|
53 |
+
archivePrefix={arXiv},
|
54 |
+
primaryClass={cs.CL}
|
55 |
+
}
|
56 |
+
```
|
57 |
|
58 |
Cite TRL as:
|
59 |
|