weqweasdas
commited on
Commit
·
35700d2
1
Parent(s):
60ccec5
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@
|
|
8 |
|
9 |
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
|
11 |
-
In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
|
12 |
|
13 |
## Model Details
|
14 |
|
@@ -16,7 +16,7 @@ In this repo, we present a reward model trained by the framework [LMFlow](https:
|
|
16 |
|
17 |
<!-- Provide a longer summary of what this model is. -->
|
18 |
|
19 |
-
The HH-RLHF dataset contains 112K comparison samples in the training set and 12.5K comparison samples in the test set. We first replace the ``\n\nHuman'' and ``\n\nAssistant'' in the dataset by ``###Human'' and ``###Assistant'', respectively.
|
20 |
|
21 |
Then, we split the dataset as follows:
|
22 |
|
|
|
8 |
|
9 |
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
|
11 |
+
In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
|
12 |
|
13 |
## Model Details
|
14 |
|
|
|
16 |
|
17 |
<!-- Provide a longer summary of what this model is. -->
|
18 |
|
19 |
+
The HH-RLHF-Helpful dataset contains 112K comparison samples in the training set and 12.5K comparison samples in the test set. We first replace the ``\n\nHuman'' and ``\n\nAssistant'' in the dataset by ``###Human'' and ``###Assistant'', respectively.
|
20 |
|
21 |
Then, we split the dataset as follows:
|
22 |
|