fblgit leaderboard-pr-bot commited on
Commit
6f67499
1 Parent(s): b8ac85b

Adding Evaluation Results (#3)

Browse files

- Adding Evaluation Results (5306f3ddcc22e03e523a12ed685b3fd90ab0c9b8)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -194,6 +194,98 @@ model-index:
194
  source:
195
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
196
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
  ---
198
 
199
  # juanako-7b-UNA (Uniform Neural Alignment)
@@ -475,3 +567,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
475
  |Winogrande (5-shot) |78.85|
476
  |GSM8k (5-shot) |44.81|
477
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  source:
195
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
196
  name: Open LLM Leaderboard
197
+ - task:
198
+ type: text-generation
199
+ name: Text Generation
200
+ dataset:
201
+ name: IFEval (0-Shot)
202
+ type: HuggingFaceH4/ifeval
203
+ args:
204
+ num_few_shot: 0
205
+ metrics:
206
+ - type: inst_level_strict_acc and prompt_level_strict_acc
207
+ value: 48.37
208
+ name: strict accuracy
209
+ source:
210
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
211
+ name: Open LLM Leaderboard
212
+ - task:
213
+ type: text-generation
214
+ name: Text Generation
215
+ dataset:
216
+ name: BBH (3-Shot)
217
+ type: BBH
218
+ args:
219
+ num_few_shot: 3
220
+ metrics:
221
+ - type: acc_norm
222
+ value: 30.42
223
+ name: normalized accuracy
224
+ source:
225
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
226
+ name: Open LLM Leaderboard
227
+ - task:
228
+ type: text-generation
229
+ name: Text Generation
230
+ dataset:
231
+ name: MATH Lvl 5 (4-Shot)
232
+ type: hendrycks/competition_math
233
+ args:
234
+ num_few_shot: 4
235
+ metrics:
236
+ - type: exact_match
237
+ value: 2.87
238
+ name: exact match
239
+ source:
240
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
241
+ name: Open LLM Leaderboard
242
+ - task:
243
+ type: text-generation
244
+ name: Text Generation
245
+ dataset:
246
+ name: GPQA (0-shot)
247
+ type: Idavidrein/gpqa
248
+ args:
249
+ num_few_shot: 0
250
+ metrics:
251
+ - type: acc_norm
252
+ value: 6.15
253
+ name: acc_norm
254
+ source:
255
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
256
+ name: Open LLM Leaderboard
257
+ - task:
258
+ type: text-generation
259
+ name: Text Generation
260
+ dataset:
261
+ name: MuSR (0-shot)
262
+ type: TAUR-Lab/MuSR
263
+ args:
264
+ num_few_shot: 0
265
+ metrics:
266
+ - type: acc_norm
267
+ value: 17.16
268
+ name: acc_norm
269
+ source:
270
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
271
+ name: Open LLM Leaderboard
272
+ - task:
273
+ type: text-generation
274
+ name: Text Generation
275
+ dataset:
276
+ name: MMLU-PRO (5-shot)
277
+ type: TIGER-Lab/MMLU-Pro
278
+ config: main
279
+ split: test
280
+ args:
281
+ num_few_shot: 5
282
+ metrics:
283
+ - type: acc
284
+ value: 19.68
285
+ name: accuracy
286
+ source:
287
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
288
+ name: Open LLM Leaderboard
289
  ---
290
 
291
  # juanako-7b-UNA (Uniform Neural Alignment)
 
567
  |Winogrande (5-shot) |78.85|
568
  |GSM8k (5-shot) |44.81|
569
 
570
+
571
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
572
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__juanako-7b-UNA)
573
+
574
+ | Metric |Value|
575
+ |-------------------|----:|
576
+ |Avg. |20.77|
577
+ |IFEval (0-Shot) |48.37|
578
+ |BBH (3-Shot) |30.42|
579
+ |MATH Lvl 5 (4-Shot)| 2.87|
580
+ |GPQA (0-shot) | 6.15|
581
+ |MuSR (0-shot) |17.16|
582
+ |MMLU-PRO (5-shot) |19.68|
583
+