andreaskoepf commited on
Commit
f13ced5
1 Parent(s): da5e615

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -1
README.md CHANGED
@@ -1,7 +1,54 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
4
  - wandb: https://wandb.ai/open-assistant/reward-model/runs/kadgqj65
5
  - checkpoint: 10k steps
6
 
7
- Compute was generously provided by [Stability AI](https://stability.ai/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Pythia 1.4B Based Reward Model
5
+
6
+ - base model: andreaskoepf/pythia-1.4b-gpt4all-pretrain
7
  - wandb: https://wandb.ai/open-assistant/reward-model/runs/kadgqj65
8
  - checkpoint: 10k steps
9
 
10
+
11
+ Compute was generously provided by [Stability AI](https://stability.ai/)
12
+
13
+
14
+ ### How to use
15
+
16
+ ```python
17
+ # install open assistant model_training module (e.g. run `pip install -e .` in `model/` directory of open-assistant repository)
18
+ import model_training.models.reward_model # noqa: F401 (registers reward model for AutoModel loading)
19
+
20
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
21
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
22
+ input_text = "<|prompter|>Hi how are you?<|endoftext|><|assistant|>Hi, I am Open-Assistant a large open-source language model trained by LAION AI. How can I help you today?<|endoftext|>"
23
+ inputs = tokenizer(input_text, return_tensors="pt")
24
+ score = rm(**inputs).logits[0].cpu().detach()
25
+ print(score)
26
+ ```
27
+
28
+ ### Datasets
29
+
30
+ ```
31
+ datasets:
32
+ - oasst_export:
33
+ lang: "en,es,de,fr"
34
+ input_file_path: 2023-03-27_oasst_research_ready_synth.jsonl.gz
35
+ val_split: 0.1
36
+ - augment_oasst:
37
+ input_file_path: augmented_latin_cyrillic_oasst_2023-03-27_v2.jsonl
38
+ - anthropic_rlhf:
39
+ fraction: 0.1
40
+ max_val_set: 1000
41
+ - shp:
42
+ max_val_set: 1000
43
+ - hellaswag:
44
+ fraction: 0.5
45
+ max_val_set: 1000
46
+ - webgpt:
47
+ val_split: 0.05
48
+ max_val_set: 1000
49
+ - hf_summary_pairs:
50
+ fraction: 0.1
51
+ max_val_set: 250
52
+ ```
53
+
54
+ (internal note: ignore (high) eval accuracy values of oasst_export, oasst-eval samples were part of training set)