alnrg2arg commited on
Commit
8a741a3
1 Parent(s): ea3fecf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -56,3 +56,38 @@ dpo_trainer = DPOTrainer(
56
  )
57
  ```
58
  The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  )
57
  ```
58
  The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
59
+
60
+
61
+ Benchmark Scores
62
+
63
+ | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
64
+ |-------------|------:|------|-----:|--------|-----:|---|-----:|
65
+ |arc_challenge| 1|none | 0|acc |0.6894|± |0.0135|
66
+ | | |none | 0|acc_norm|0.6860|± |0.0136|
67
+
68
+ | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
69
+ |---------|------:|------|-----:|--------|-----:|---|-----:|
70
+ |hellaswag| 1|none | 0|acc |0.7092|± |0.0045|
71
+ | | |none | 0|acc_norm|0.8736|± |0.0033|
72
+
73
+ | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
74
+ |--------------|------:|------|-----:|------|-----:|---|-----:|
75
+ |truthfulqa_mc2| 2|none | 0|acc |0.7126|± | 0.015|
76
+
77
+ | Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
78
+ |------------------|-------|------|-----:|------|-----:|---|-----:|
79
+ |mmlu |N/A |none | 0|acc |0.6225|± |0.1292|
80
+ | - humanities |N/A |none | 0|acc |0.5745|± |0.1286|
81
+ | - other |N/A |none | 0|acc |0.6952|± |0.1095|
82
+ | - social_sciences|N/A |none | 0|acc |0.7280|± |0.0735|
83
+ | - stem |N/A |none | 0|acc |0.5195|± |0.1313|
84
+
85
+ | Tasks |Version|Filter|n-shot|Metric|Value| |Stderr|
86
+ |----------|------:|------|-----:|------|----:|---|-----:|
87
+ |winogrande| 1|none | 0|acc |0.824|± |0.0107|
88
+
89
+ |Tasks|Version| Filter |n-shot| Metric |Value | |Stderr|
90
+ |-----|------:|----------|-----:|-----------|-----:|---|-----:|
91
+ |gsm8k| 2|get-answer| 5|exact_match|0.7263|± |0.0123|
92
+
93
+ Average = 74.08