AINovice2005 commited on
Commit
6ae5393
1 Parent(s): b9bbf9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -21,16 +21,22 @@ tags:
21
 
22
  ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
23
 
24
- The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used to improve the performance of the model.
25
 
26
  ## Citation
27
 
28
  [Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023, May 23). ] https://arxiv.org/abs/2305.14314.
29
 
30
 
31
- ## Bleu:0.0209
 
32
 
33
- The model recipe: https://github.com/ParagEkbote/El-Emperador_ModelRecipe
 
 
 
 
 
34
 
35
  ## Inference Script:
36
 
 
21
 
22
  ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
23
 
24
+ The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit a small portion was used due to GPU constraints.
25
 
26
  ## Citation
27
 
28
  [Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023, May 23). ] https://arxiv.org/abs/2305.14314.
29
 
30
 
31
+ # Evals:
32
+ BLEU:0.0209
33
 
34
+ # Conclusion and Model Recipe.
35
+ ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
36
+
37
+ leading to more user-friendly and acceptable results.
38
+
39
+ The model recipe: [ https://github.com/ParagEkbote/El-Emperador_ModelRecipe]
40
 
41
  ## Inference Script:
42