trl-lib
/

llama-7b-se-rl-peft

Model card Files Files and versions Community

natolambert commited on Apr 6, 2023

Commit

90ee929

·

1 Parent(s): f3f13a5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ datasets:
 # Llama-se-rl-peft
 Adapter weights of an RL fine-tuned model based on LLaMA (see Meta's LLaMA release for the original LLaMA model).
 For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
 ## Model Description
 **Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model.

 # Llama-se-rl-peft
 Adapter weights of an RL fine-tuned model based on LLaMA (see Meta's LLaMA release for the original LLaMA model).
 For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
+The reward model used to train this model can be found [here](https://huggingface.co/trl-lib/llama-7b-se-rm-peft).
 ## Model Description
 **Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model.