natolambert
commited on
Commit
•
90ee929
1
Parent(s):
f3f13a5
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ datasets:
|
|
16 |
# Llama-se-rl-peft
|
17 |
Adapter weights of an RL fine-tuned model based on LLaMA (see Meta's LLaMA release for the original LLaMA model).
|
18 |
For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
|
19 |
-
|
20 |
|
21 |
## Model Description
|
22 |
**Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model.
|
|
|
16 |
# Llama-se-rl-peft
|
17 |
Adapter weights of an RL fine-tuned model based on LLaMA (see Meta's LLaMA release for the original LLaMA model).
|
18 |
For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
|
19 |
+
The reward model used to train this model can be found [here](https://huggingface.co/trl-lib/llama-7b-se-rm-peft).
|
20 |
|
21 |
## Model Description
|
22 |
**Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model.
|