dnhkng leaderboard-pr-bot commited on
Commit
88d4730
•
1 Parent(s): afb0d59

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (841f463d8ff387c53f55f4d0b803535c0ca30829)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -12,6 +12,9 @@ model-index:
12
  args:
13
  num_few_shot: 0
14
  metrics:
 
 
 
15
  - type: inst_level_strict_acc and prompt_level_strict_acc
16
  value: 44.06
17
  name: strict accuracy
@@ -27,6 +30,9 @@ model-index:
27
  args:
28
  num_few_shot: 3
29
  metrics:
 
 
 
30
  - type: acc_norm
31
  value: 47.73
32
  name: normalized accuracy
@@ -42,6 +48,9 @@ model-index:
42
  args:
43
  num_few_shot: 4
44
  metrics:
 
 
 
45
  - type: exact_match
46
  value: 7.78
47
  name: exact match
@@ -57,6 +66,9 @@ model-index:
57
  args:
58
  num_few_shot: 0
59
  metrics:
 
 
 
60
  - type: acc_norm
61
  value: 10.4
62
  name: acc_norm
@@ -72,6 +84,9 @@ model-index:
72
  args:
73
  num_few_shot: 0
74
  metrics:
 
 
 
75
  - type: acc_norm
76
  value: 8.73
77
  name: acc_norm
@@ -89,6 +104,9 @@ model-index:
89
  args:
90
  num_few_shot: 5
91
  metrics:
 
 
 
92
  - type: acc
93
  value: 36.96
94
  name: accuracy
@@ -172,3 +190,17 @@ ___________________________________
172
  I'm based out of Munich, Germany, but I would be interested in working remotely for a team with more compute than my 2x 4090s 🚀
173
 
174
  #### Reach out via [LinkedIn - Dr David Noel Ng](https://www.linkedin.com/in/dnhkng)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  args:
13
  num_few_shot: 0
14
  metrics:
15
+ - type: inst_level_strict_acc and prompt_level_strict_acc
16
+ value: 44.06
17
+ name: strict accuracy
18
  - type: inst_level_strict_acc and prompt_level_strict_acc
19
  value: 44.06
20
  name: strict accuracy
 
30
  args:
31
  num_few_shot: 3
32
  metrics:
33
+ - type: acc_norm
34
+ value: 47.73
35
+ name: normalized accuracy
36
  - type: acc_norm
37
  value: 47.73
38
  name: normalized accuracy
 
48
  args:
49
  num_few_shot: 4
50
  metrics:
51
+ - type: exact_match
52
+ value: 7.78
53
+ name: exact match
54
  - type: exact_match
55
  value: 7.78
56
  name: exact match
 
66
  args:
67
  num_few_shot: 0
68
  metrics:
69
+ - type: acc_norm
70
+ value: 10.4
71
+ name: acc_norm
72
  - type: acc_norm
73
  value: 10.4
74
  name: acc_norm
 
84
  args:
85
  num_few_shot: 0
86
  metrics:
87
+ - type: acc_norm
88
+ value: 8.73
89
+ name: acc_norm
90
  - type: acc_norm
91
  value: 8.73
92
  name: acc_norm
 
104
  args:
105
  num_few_shot: 5
106
  metrics:
107
+ - type: acc
108
+ value: 36.96
109
+ name: accuracy
110
  - type: acc
111
  value: 36.96
112
  name: accuracy
 
190
  I'm based out of Munich, Germany, but I would be interested in working remotely for a team with more compute than my 2x 4090s 🚀
191
 
192
  #### Reach out via [LinkedIn - Dr David Noel Ng](https://www.linkedin.com/in/dnhkng)
193
+
194
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
195
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dnhkng__RYS-Medium)
196
+
197
+ | Metric |Value|
198
+ |-------------------|----:|
199
+ |Avg. |25.94|
200
+ |IFEval (0-Shot) |44.06|
201
+ |BBH (3-Shot) |47.73|
202
+ |MATH Lvl 5 (4-Shot)| 7.78|
203
+ |GPQA (0-shot) |10.40|
204
+ |MuSR (0-shot) | 8.73|
205
+ |MMLU-PRO (5-shot) |36.96|
206
+