trl-lib
/

llama-7b-se-rl-peft

Model card Files Files and versions Community

natolambert commited on Apr 6, 2023

Commit

e7487f2

·

1 Parent(s): 51658cd

Update README.md

Files changed (1) hide show

README.md +19 -6

README.md CHANGED Viewed

@@ -14,12 +14,15 @@ datasets:
 ![pull_figure](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/stack-llama.png)
 # Llama-se-rl-peft
-Adapter weights of an RL fine-tuned model based on LLaMa. Authored by Edward Beeching, Younes Belkada, Kashif Rasul, Lewis Tunstall and Leandro von Werra.
 For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
 ## Model Description
-**Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model. This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more. The model is designed to generate human-like responses to questions in these domains. The model has been training to respond to prompts with the following template:
 ```
 Question: <Query>
@@ -45,9 +48,19 @@ Additionally, the model may generate answers that are incorrect or misleading du
 ## BibTeX entry and citation info
 ```bibtex
-@misc{beeching2023llama,
-  title={StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering},
-  author={Beeching, Edward and Belkada, Younes and Rasul, Kashif and Tunstall, Lewis and von Werra, Leandro},
-  year={2023}
 }
 ```

 ![pull_figure](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/stack-llama.png)
 # Llama-se-rl-peft
+Adapter weights of an RL fine-tuned model based on LLaMA (see Meta's LLaMA release for the original LLaMA model).
 For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
 ## Model Description
+**Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model.
+This dataset consists of questions and answers from various domains in Stack Exchange, such as programming, mathematics, physics, and more.
+The model is designed to generate human-like responses to questions in these domains.
+The model has been training to respond to prompts with the following template:
 ```
 Question: <Query>
 ## BibTeX entry and citation info
 ```bibtex
+@misc {beeching2023stackllama,
+	author       = { Edward Beeching and
+                     Younes Belkada and
+                     Kashif Rasul and
+                     Lewis Tunstall and
+                     Leandro von Werra and
+                     Nazneen Rajani and
+                     Nathan Lambert
+                   },
+	title        = { StackLLaMa: An RL Fine-tuned LLaMa Model for Stack Exchange Question and Answering },
+	year         = 2023,
+	url          = { https://huggingface.co/trl-lib/llama-7b-se-rl-peft },
+	doi          = { 10.57967/hf/0513 },
+	publisher    = { Hugging Face }
 }
 ```