152334H leaderboard-pr-bot commited on
Commit
d1caed0
1 Parent(s): 1dca4cc

Adding Evaluation Results (#23)

Browse files

- Adding Evaluation Results (7ca07b95ff1fac8f691a73fc2811ebbd97ca01ad)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +114 -14
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  model-index:
3
  - name: miqu-1-70b-sf
4
  results:
@@ -17,8 +19,7 @@ model-index:
17
  value: 73.04
18
  name: normalized accuracy
19
  source:
20
- url: >-
21
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
22
  name: Open LLM Leaderboard
23
  - task:
24
  type: text-generation
@@ -34,8 +35,7 @@ model-index:
34
  value: 88.61
35
  name: normalized accuracy
36
  source:
37
- url: >-
38
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
39
  name: Open LLM Leaderboard
40
  - task:
41
  type: text-generation
@@ -52,8 +52,7 @@ model-index:
52
  value: 75.49
53
  name: accuracy
54
  source:
55
- url: >-
56
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
57
  name: Open LLM Leaderboard
58
  - task:
59
  type: text-generation
@@ -69,8 +68,7 @@ model-index:
69
  - type: mc2
70
  value: 69.38
71
  source:
72
- url: >-
73
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
74
  name: Open LLM Leaderboard
75
  - task:
76
  type: text-generation
@@ -87,8 +85,7 @@ model-index:
87
  value: 85.32
88
  name: accuracy
89
  source:
90
- url: >-
91
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
92
  name: Open LLM Leaderboard
93
  - task:
94
  type: text-generation
@@ -105,11 +102,100 @@ model-index:
105
  value: 67.7
106
  name: accuracy
107
  source:
108
- url: >-
109
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  name: Open LLM Leaderboard
111
- language:
112
- - en
113
  ---
114
 
115
  update: added NOMERGE license
@@ -290,3 +376,17 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
290
  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
291
  SOFTWARE.
292
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  model-index:
5
  - name: miqu-1-70b-sf
6
  results:
 
19
  value: 73.04
20
  name: normalized accuracy
21
  source:
22
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
 
23
  name: Open LLM Leaderboard
24
  - task:
25
  type: text-generation
 
35
  value: 88.61
36
  name: normalized accuracy
37
  source:
38
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
 
39
  name: Open LLM Leaderboard
40
  - task:
41
  type: text-generation
 
52
  value: 75.49
53
  name: accuracy
54
  source:
55
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
 
56
  name: Open LLM Leaderboard
57
  - task:
58
  type: text-generation
 
68
  - type: mc2
69
  value: 69.38
70
  source:
71
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
 
72
  name: Open LLM Leaderboard
73
  - task:
74
  type: text-generation
 
85
  value: 85.32
86
  name: accuracy
87
  source:
88
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
 
89
  name: Open LLM Leaderboard
90
  - task:
91
  type: text-generation
 
102
  value: 67.7
103
  name: accuracy
104
  source:
105
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
106
+ name: Open LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: IFEval (0-Shot)
112
+ type: HuggingFaceH4/ifeval
113
+ args:
114
+ num_few_shot: 0
115
+ metrics:
116
+ - type: inst_level_strict_acc and prompt_level_strict_acc
117
+ value: 51.82
118
+ name: strict accuracy
119
+ source:
120
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
121
+ name: Open LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: BBH (3-Shot)
127
+ type: BBH
128
+ args:
129
+ num_few_shot: 3
130
+ metrics:
131
+ - type: acc_norm
132
+ value: 43.81
133
+ name: normalized accuracy
134
+ source:
135
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
136
+ name: Open LLM Leaderboard
137
+ - task:
138
+ type: text-generation
139
+ name: Text Generation
140
+ dataset:
141
+ name: MATH Lvl 5 (4-Shot)
142
+ type: hendrycks/competition_math
143
+ args:
144
+ num_few_shot: 4
145
+ metrics:
146
+ - type: exact_match
147
+ value: 10.8
148
+ name: exact match
149
+ source:
150
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
151
+ name: Open LLM Leaderboard
152
+ - task:
153
+ type: text-generation
154
+ name: Text Generation
155
+ dataset:
156
+ name: GPQA (0-shot)
157
+ type: Idavidrein/gpqa
158
+ args:
159
+ num_few_shot: 0
160
+ metrics:
161
+ - type: acc_norm
162
+ value: 13.42
163
+ name: acc_norm
164
+ source:
165
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
166
+ name: Open LLM Leaderboard
167
+ - task:
168
+ type: text-generation
169
+ name: Text Generation
170
+ dataset:
171
+ name: MuSR (0-shot)
172
+ type: TAUR-Lab/MuSR
173
+ args:
174
+ num_few_shot: 0
175
+ metrics:
176
+ - type: acc_norm
177
+ value: 17.21
178
+ name: acc_norm
179
+ source:
180
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
181
+ name: Open LLM Leaderboard
182
+ - task:
183
+ type: text-generation
184
+ name: Text Generation
185
+ dataset:
186
+ name: MMLU-PRO (5-shot)
187
+ type: TIGER-Lab/MMLU-Pro
188
+ config: main
189
+ split: test
190
+ args:
191
+ num_few_shot: 5
192
+ metrics:
193
+ - type: acc
194
+ value: 35.87
195
+ name: accuracy
196
+ source:
197
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=152334H/miqu-1-70b-sf
198
  name: Open LLM Leaderboard
 
 
199
  ---
200
 
201
  update: added NOMERGE license
 
376
  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
377
  SOFTWARE.
378
  ```
379
+
380
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
381
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_152334H__miqu-1-70b-sf)
382
+
383
+ | Metric |Value|
384
+ |-------------------|----:|
385
+ |Avg. |28.82|
386
+ |IFEval (0-Shot) |51.82|
387
+ |BBH (3-Shot) |43.81|
388
+ |MATH Lvl 5 (4-Shot)|10.80|
389
+ |GPQA (0-shot) |13.42|
390
+ |MuSR (0-shot) |17.21|
391
+ |MMLU-PRO (5-shot) |35.87|
392
+