Alepach
/

notHumpback-M0

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Alepach commited on 6 days ago

Commit

aed3b3b

·

verified ·

1 Parent(s): 33732f8

Restore model card

Files changed (1) hide show

README.md +29 -17

README.md CHANGED Viewed

@@ -6,30 +6,31 @@ tags:
 - generated_from_trainer
 - trl
 - sft
-licence: license
 ---
-# Model Card for notHumpback-M0
-This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="Alepach/notHumpback-M0", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-This model was trained with SFT.
 ### Framework versions
@@ -41,7 +42,18 @@ This model was trained with SFT.
 ## Citations
 Cite TRL as:

 - generated_from_trainer
 - trl
 - sft
+license: apache-2.0
+datasets:
+- OpenAssistant/oasst1
 ---
+# notHumpback-M0
+This model follows the Humpback architecture, proposed in the paper [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259)
+by Li et al.
+It represents the "seed model", which is trained on a small amount of gold data and then
+used to score the instruction-response pairs
+generated by the ["backward model"](https://huggingface.co/Alepach/notHumpback-Myx).
+Humpback uses instruction backtranslation on a web corpus to generate input-output pairs (self-augmentation),
+creating a richer dataset for fine-tuning models without the need for additional manual annotation.
+The model then iteratively curates the created dataset, scoring the pairs by quality, and is then finetuned on the resulting subset
+of all pairs with the highest possible score (self-curation).
+Varying from the original paper, this model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+The dataset used to train this model has been sampled from the [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
+To enable the model to judge and score the generated pairs, the model undergoes basic instruction-tuning on the input-output
+pairs contained in the dataset.
 ### Framework versions
 ## Citations
+Original paper:
+```bibtex
+@misc{li2023selfalignment,
+    title={Self-Alignment with Instruction Backtranslation},
+    author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
+    year={2023},
+    eprint={2308.06259},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
 Cite TRL as: