leaderboard-pr-bot commited on
Commit
8b47f9d
1 Parent(s): 46d4474

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +112 -7
README.md CHANGED
@@ -1,4 +1,10 @@
1
  ---
 
 
 
 
 
 
2
  pipeline_tag: text-generation
3
  inference:
4
  parameters:
@@ -8,12 +14,6 @@ widget:
8
  - text: 'def print_hello_world():'
9
  example_title: Hello world
10
  group: Python
11
- datasets:
12
- - bigcode/the-stack-v2-train
13
- license: bigcode-openrail-m
14
- library_name: transformers
15
- tags:
16
- - code
17
  model-index:
18
  - name: starcoder2-15b
19
  results:
@@ -65,6 +65,98 @@ model-index:
65
  metrics:
66
  - type: edit-smiliarity
67
  value: 74.08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ---
69
 
70
  # StarCoder2
@@ -212,4 +304,17 @@ The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can
212
  archivePrefix={arXiv},
213
  primaryClass={cs.SE}
214
  }
215
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: bigcode-openrail-m
3
+ library_name: transformers
4
+ tags:
5
+ - code
6
+ datasets:
7
+ - bigcode/the-stack-v2-train
8
  pipeline_tag: text-generation
9
  inference:
10
  parameters:
 
14
  - text: 'def print_hello_world():'
15
  example_title: Hello world
16
  group: Python
 
 
 
 
 
 
17
  model-index:
18
  - name: starcoder2-15b
19
  results:
 
65
  metrics:
66
  - type: edit-smiliarity
67
  value: 74.08
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: IFEval (0-Shot)
73
+ type: HuggingFaceH4/ifeval
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: inst_level_strict_acc and prompt_level_strict_acc
78
+ value: 27.35
79
+ name: strict accuracy
80
+ source:
81
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigcode/starcoder2-15b
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: BBH (3-Shot)
88
+ type: BBH
89
+ args:
90
+ num_few_shot: 3
91
+ metrics:
92
+ - type: acc_norm
93
+ value: 20.24
94
+ name: normalized accuracy
95
+ source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigcode/starcoder2-15b
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: MATH Lvl 5 (4-Shot)
103
+ type: hendrycks/competition_math
104
+ args:
105
+ num_few_shot: 4
106
+ metrics:
107
+ - type: exact_match
108
+ value: 4.83
109
+ name: exact match
110
+ source:
111
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigcode/starcoder2-15b
112
+ name: Open LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: GPQA (0-shot)
118
+ type: Idavidrein/gpqa
119
+ args:
120
+ num_few_shot: 0
121
+ metrics:
122
+ - type: acc_norm
123
+ value: 2.91
124
+ name: acc_norm
125
+ source:
126
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigcode/starcoder2-15b
127
+ name: Open LLM Leaderboard
128
+ - task:
129
+ type: text-generation
130
+ name: Text Generation
131
+ dataset:
132
+ name: MuSR (0-shot)
133
+ type: TAUR-Lab/MuSR
134
+ args:
135
+ num_few_shot: 0
136
+ metrics:
137
+ - type: acc_norm
138
+ value: 2.93
139
+ name: acc_norm
140
+ source:
141
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigcode/starcoder2-15b
142
+ name: Open LLM Leaderboard
143
+ - task:
144
+ type: text-generation
145
+ name: Text Generation
146
+ dataset:
147
+ name: MMLU-PRO (5-shot)
148
+ type: TIGER-Lab/MMLU-Pro
149
+ config: main
150
+ split: test
151
+ args:
152
+ num_few_shot: 5
153
+ metrics:
154
+ - type: acc
155
+ value: 15.03
156
+ name: accuracy
157
+ source:
158
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigcode/starcoder2-15b
159
+ name: Open LLM Leaderboard
160
  ---
161
 
162
  # StarCoder2
 
304
  archivePrefix={arXiv},
305
  primaryClass={cs.SE}
306
  }
307
+ ```
308
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
309
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_bigcode__starcoder2-15b)
310
+
311
+ | Metric |Value|
312
+ |-------------------|----:|
313
+ |Avg. |12.21|
314
+ |IFEval (0-Shot) |27.35|
315
+ |BBH (3-Shot) |20.24|
316
+ |MATH Lvl 5 (4-Shot)| 4.83|
317
+ |GPQA (0-shot) | 2.91|
318
+ |MuSR (0-shot) | 2.93|
319
+ |MMLU-PRO (5-shot) |15.03|
320
+