Update README.md
Browse files
README.md
CHANGED
@@ -56,3 +56,38 @@ dpo_trainer = DPOTrainer(
|
|
56 |
)
|
57 |
```
|
58 |
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
)
|
57 |
```
|
58 |
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
|
59 |
+
|
60 |
+
|
61 |
+
Benchmark Scores
|
62 |
+
|
63 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
64 |
+
|-------------|------:|------|-----:|--------|-----:|---|-----:|
|
65 |
+
|arc_challenge| 1|none | 0|acc |0.6894|± |0.0135|
|
66 |
+
| | |none | 0|acc_norm|0.6860|± |0.0136|
|
67 |
+
|
68 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
69 |
+
|---------|------:|------|-----:|--------|-----:|---|-----:|
|
70 |
+
|hellaswag| 1|none | 0|acc |0.7092|± |0.0045|
|
71 |
+
| | |none | 0|acc_norm|0.8736|± |0.0033|
|
72 |
+
|
73 |
+
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|
74 |
+
|--------------|------:|------|-----:|------|-----:|---|-----:|
|
75 |
+
|truthfulqa_mc2| 2|none | 0|acc |0.7126|± | 0.015|
|
76 |
+
|
77 |
+
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
78 |
+
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
79 |
+
|mmlu |N/A |none | 0|acc |0.6225|± |0.1292|
|
80 |
+
| - humanities |N/A |none | 0|acc |0.5745|± |0.1286|
|
81 |
+
| - other |N/A |none | 0|acc |0.6952|± |0.1095|
|
82 |
+
| - social_sciences|N/A |none | 0|acc |0.7280|± |0.0735|
|
83 |
+
| - stem |N/A |none | 0|acc |0.5195|± |0.1313|
|
84 |
+
|
85 |
+
| Tasks |Version|Filter|n-shot|Metric|Value| |Stderr|
|
86 |
+
|----------|------:|------|-----:|------|----:|---|-----:|
|
87 |
+
|winogrande| 1|none | 0|acc |0.824|± |0.0107|
|
88 |
+
|
89 |
+
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr|
|
90 |
+
|-----|------:|----------|-----:|-----------|-----:|---|-----:|
|
91 |
+
|gsm8k| 2|get-answer| 5|exact_match|0.7263|± |0.0123|
|
92 |
+
|
93 |
+
Average = 74.08
|