Text Generation
Transformers
PyTorch
Safetensors
English
hf_olmo
conversational
custom_code
natolambert commited on
Commit
53f06b7
1 Parent(s): d87f426

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -150,6 +150,7 @@ Certainly! Here's the table with SFT and DPO as rows:
150
  | **SFT** | 2 × 10^-6 | N/A | 3 | Linear warmup for the first 3% of total training time, then cooldown to 0 | 0 | 0 | 2048 |
151
  | **DPO** | 5 × 10^-7 | 0.1 | 3 | Linear warmup for the first 10% of total training time, then cooldown to 0| 0 | 0 | 2048 |
152
 
 
153
 
154
  ## Bias, Risks, and Limitations
155
 
 
150
  | **SFT** | 2 × 10^-6 | N/A | 3 | Linear warmup for the first 3% of total training time, then cooldown to 0 | 0 | 0 | 2048 |
151
  | **DPO** | 5 × 10^-7 | 0.1 | 3 | Linear warmup for the first 10% of total training time, then cooldown to 0| 0 | 0 | 2048 |
152
 
153
+ Compared to Tulu 2, DPO hyperparameters are the same. SFT is lower LR and 3 epochs instead of 2 (and 2k length instead of 8k).
154
 
155
  ## Bias, Risks, and Limitations
156