Alepach commited on
Commit
aed3b3b
·
verified ·
1 Parent(s): 33732f8

Restore model card

Browse files
Files changed (1) hide show
  1. README.md +29 -17
README.md CHANGED
@@ -6,30 +6,31 @@ tags:
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
- licence: license
 
 
10
  ---
11
 
12
- # Model Card for notHumpback-M0
13
 
14
- This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
15
- It has been trained using [TRL](https://github.com/huggingface/trl).
16
-
17
- ## Quick start
18
-
19
- ```python
20
- from transformers import pipeline
21
-
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="Alepach/notHumpback-M0", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
26
- ```
27
 
28
- ## Training procedure
 
 
29
 
 
 
 
 
30
 
 
 
31
 
32
- This model was trained with SFT.
 
 
33
 
34
  ### Framework versions
35
 
@@ -41,7 +42,18 @@ This model was trained with SFT.
41
 
42
  ## Citations
43
 
 
44
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  Cite TRL as:
47
 
 
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
+ license: apache-2.0
10
+ datasets:
11
+ - OpenAssistant/oasst1
12
  ---
13
 
14
+ # notHumpback-M0
15
 
16
+ This model follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
17
+ by Li et al.
 
 
 
 
 
 
 
 
 
 
 
18
 
19
+ It represents the "seed model", which is trained on a small amount of gold data and then
20
+ used to score the instruction-response pairs
21
+ generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx).
22
 
23
+ Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
24
+ creating a richer dataset for fine-tuning models without the need for additional manual annotation.
25
+ The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
26
+ of all pairs with the highest possible score (self-curation).
27
 
28
+ Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
29
+ It has been trained using [TRL](https://github.com/huggingface/trl).
30
 
31
+ The dataset used to train this model has been sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
32
+ To enable the model to judge and score the generated pairs, the model undergoes basic instruction-tuning on the input-output
33
+ pairs contained in the dataset.
34
 
35
  ### Framework versions
36
 
 
42
 
43
  ## Citations
44
 
45
+ Original paper:
46
 
47
+ ```bibtex
48
+ @misc{li2023selfalignment,
49
+ title={Self-Alignment with Instruction Backtranslation},
50
+ author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
51
+ year={2023},
52
+ eprint={2308.06259},
53
+ archivePrefix={arXiv},
54
+ primaryClass={cs.CL}
55
+ }
56
+ ```
57
 
58
  Cite TRL as:
59