bedio commited on
Commit
e59ac2d
1 Parent(s): 35aa277

Adding Evaluation Results (#3)

Browse files

- Adding Evaluation Results (bbae4a8440862e890c1947970d52fcf35a592aef)

Files changed (1) hide show
  1. README.md +20 -12
README.md CHANGED
@@ -16,8 +16,7 @@ model-index:
16
  value: 48.13
17
  name: strict accuracy
18
  source:
19
- url: >-
20
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
21
  name: Open LLM Leaderboard
22
  - task:
23
  type: text-generation
@@ -32,8 +31,7 @@ model-index:
32
  value: 5.19
33
  name: normalized accuracy
34
  source:
35
- url: >-
36
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
37
  name: Open LLM Leaderboard
38
  - task:
39
  type: text-generation
@@ -48,8 +46,7 @@ model-index:
48
  value: 1.36
49
  name: exact match
50
  source:
51
- url: >-
52
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
53
  name: Open LLM Leaderboard
54
  - task:
55
  type: text-generation
@@ -64,8 +61,7 @@ model-index:
64
  value: 2.35
65
  name: acc_norm
66
  source:
67
- url: >-
68
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
69
  name: Open LLM Leaderboard
70
  - task:
71
  type: text-generation
@@ -80,8 +76,7 @@ model-index:
80
  value: 4.05
81
  name: acc_norm
82
  source:
83
- url: >-
84
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
85
  name: Open LLM Leaderboard
86
  - task:
87
  type: text-generation
@@ -98,8 +93,7 @@ model-index:
98
  value: 3.05
99
  name: accuracy
100
  source:
101
- url: >-
102
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
103
  name: Open LLM Leaderboard
104
  ---
105
 
@@ -297,3 +291,17 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
297
  ## Model Card Contact
298
299
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  value: 48.13
17
  name: strict accuracy
18
  source:
19
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
20
  name: Open LLM Leaderboard
21
  - task:
22
  type: text-generation
 
31
  value: 5.19
32
  name: normalized accuracy
33
  source:
34
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
35
  name: Open LLM Leaderboard
36
  - task:
37
  type: text-generation
 
46
  value: 1.36
47
  name: exact match
48
  source:
49
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
50
  name: Open LLM Leaderboard
51
  - task:
52
  type: text-generation
 
61
  value: 2.35
62
  name: acc_norm
63
  source:
64
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
65
  name: Open LLM Leaderboard
66
  - task:
67
  type: text-generation
 
76
  value: 4.05
77
  name: acc_norm
78
  source:
79
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
80
  name: Open LLM Leaderboard
81
  - task:
82
  type: text-generation
 
93
  value: 3.05
94
  name: accuracy
95
  source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
97
  name: Open LLM Leaderboard
98
  ---
99
 
 
291
  ## Model Card Contact
292
293
 
294
+
295
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
296
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DeepAutoAI__Explore_Llama-3.2-1B-Inst_v1.1)
297
+
298
+ | Metric |Value|
299
+ |-------------------|----:|
300
+ |Avg. |14.12|
301
+ |IFEval (0-Shot) |58.44|
302
+ |BBH (3-Shot) | 8.82|
303
+ |MATH Lvl 5 (4-Shot)| 6.04|
304
+ |GPQA (0-shot) | 1.68|
305
+ |MuSR (0-shot) | 0.66|
306
+ |MMLU-PRO (5-shot) | 9.09|
307
+