yentinglin commited on
Commit
b6d5518
1 Parent(s): 3778a15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md CHANGED
@@ -117,3 +117,73 @@ If you find Taiwan LLM is useful in your work, please cite it with:
117
  # Acknowledgement
118
 
119
  Taiwan LLM v2 is conducted in collaboration with [Ubitus K.K.](http://ubitus.net). Ubitus provides valuable compute resources for the project.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  # Acknowledgement
118
 
119
  Taiwan LLM v2 is conducted in collaboration with [Ubitus K.K.](http://ubitus.net). Ubitus provides valuable compute resources for the project.
120
+
121
+ ## Open LLM Leaderboard
122
+ | Task |Version| Metric |Value | |Stderr|
123
+ |------------------------------------------------------|------:|--------------|-----:|---|-----:|
124
+ |leaderboard:arc:challenge:25 | 0|acc |0.5529|± |0.0145|
125
+ | | |acc_norm |0.5862|± |0.0144|
126
+ |leaderboard:gsm8k:5 | 0|qem |0.3177|± |0.0128|
127
+ |leaderboard:hellaswag:10 | 0|acc |0.6307|± |0.0048|
128
+ | | |acc_norm |0.8327|± |0.0037|
129
+ |leaderboard:mmlu:_average:5 | |acc |0.5483|± |0.0356|
130
+ |leaderboard:mmlu:abstract_algebra:5 | 0|acc |0.3400|± |0.0476|
131
+ |leaderboard:mmlu:anatomy:5 | 0|acc |0.5111|± |0.0432|
132
+ |leaderboard:mmlu:astronomy:5 | 0|acc |0.5789|± |0.0402|
133
+ |leaderboard:mmlu:business_ethics:5 | 0|acc |0.5100|± |0.0502|
134
+ |leaderboard:mmlu:clinical_knowledge:5 | 0|acc |0.6000|± |0.0302|
135
+ |leaderboard:mmlu:college_biology:5 | 0|acc |0.5764|± |0.0413|
136
+ |leaderboard:mmlu:college_chemistry:5 | 0|acc |0.4100|± |0.0494|
137
+ |leaderboard:mmlu:college_computer_science:5 | 0|acc |0.4500|± |0.0500|
138
+ |leaderboard:mmlu:college_mathematics:5 | 0|acc |0.3800|± |0.0488|
139
+ |leaderboard:mmlu:college_medicine:5 | 0|acc |0.5434|± |0.0380|
140
+ |leaderboard:mmlu:college_physics:5 | 0|acc |0.2941|± |0.0453|
141
+ |leaderboard:mmlu:computer_security:5 | 0|acc |0.7000|± |0.0461|
142
+ |leaderboard:mmlu:conceptual_physics:5 | 0|acc |0.4468|± |0.0325|
143
+ |leaderboard:mmlu:econometrics:5 | 0|acc |0.2719|± |0.0419|
144
+ |leaderboard:mmlu:electrical_engineering:5 | 0|acc |0.4552|± |0.0415|
145
+ |leaderboard:mmlu:elementary_mathematics:5 | 0|acc |0.3175|± |0.0240|
146
+ |leaderboard:mmlu:formal_logic:5 | 0|acc |0.3413|± |0.0424|
147
+ |leaderboard:mmlu:global_facts:5 | 0|acc |0.3700|± |0.0485|
148
+ |leaderboard:mmlu:high_school_biology:5 | 0|acc |0.6323|± |0.0274|
149
+ |leaderboard:mmlu:high_school_chemistry:5 | 0|acc |0.4581|± |0.0351|
150
+ |leaderboard:mmlu:high_school_computer_science:5 | 0|acc |0.5400|± |0.0501|
151
+ |leaderboard:mmlu:high_school_european_history:5 | 0|acc |0.6364|± |0.0376|
152
+ |leaderboard:mmlu:high_school_geography:5 | 0|acc |0.6970|± |0.0327|
153
+ |leaderboard:mmlu:high_school_government_and_politics:5| 0|acc |0.7617|± |0.0307|
154
+ |leaderboard:mmlu:high_school_macroeconomics:5 | 0|acc |0.4974|± |0.0254|
155
+ |leaderboard:mmlu:high_school_mathematics:5 | 0|acc |0.3296|± |0.0287|
156
+ |leaderboard:mmlu:high_school_microeconomics:5 | 0|acc |0.5336|± |0.0324|
157
+ |leaderboard:mmlu:high_school_physics:5 | 0|acc |0.3709|± |0.0394|
158
+ |leaderboard:mmlu:high_school_psychology:5 | 0|acc |0.7468|± |0.0186|
159
+ |leaderboard:mmlu:high_school_statistics:5 | 0|acc |0.4074|± |0.0335|
160
+ |leaderboard:mmlu:high_school_us_history:5 | 0|acc |0.7108|± |0.0318|
161
+ |leaderboard:mmlu:high_school_world_history:5 | 0|acc |0.7046|± |0.0297|
162
+ |leaderboard:mmlu:human_aging:5 | 0|acc |0.6323|± |0.0324|
163
+ |leaderboard:mmlu:human_sexuality:5 | 0|acc |0.5878|± |0.0432|
164
+ |leaderboard:mmlu:international_law:5 | 0|acc |0.6694|± |0.0429|
165
+ |leaderboard:mmlu:jurisprudence:5 | 0|acc |0.7037|± |0.0441|
166
+ |leaderboard:mmlu:logical_fallacies:5 | 0|acc |0.6564|± |0.0373|
167
+ |leaderboard:mmlu:machine_learning:5 | 0|acc |0.3393|± |0.0449|
168
+ |leaderboard:mmlu:management:5 | 0|acc |0.7087|± |0.0450|
169
+ |leaderboard:mmlu:marketing:5 | 0|acc |0.8333|± |0.0244|
170
+ |leaderboard:mmlu:medical_genetics:5 | 0|acc |0.5400|± |0.0501|
171
+ |leaderboard:mmlu:miscellaneous:5 | 0|acc |0.7382|± |0.0157|
172
+ |leaderboard:mmlu:moral_disputes:5 | 0|acc |0.6127|± |0.0262|
173
+ |leaderboard:mmlu:moral_scenarios:5 | 0|acc |0.3788|± |0.0162|
174
+ |leaderboard:mmlu:nutrition:5 | 0|acc |0.6046|± |0.0280|
175
+ |leaderboard:mmlu:philosophy:5 | 0|acc |0.6270|± |0.0275|
176
+ |leaderboard:mmlu:prehistory:5 | 0|acc |0.6204|± |0.0270|
177
+ |leaderboard:mmlu:professional_accounting:5 | 0|acc |0.3582|± |0.0286|
178
+ |leaderboard:mmlu:professional_law:5 | 0|acc |0.3931|± |0.0125|
179
+ |leaderboard:mmlu:professional_medicine:5 | 0|acc |0.5184|± |0.0304|
180
+ |leaderboard:mmlu:professional_psychology:5 | 0|acc |0.5556|± |0.0201|
181
+ |leaderboard:mmlu:public_relations:5 | 0|acc |0.6818|± |0.0446|
182
+ |leaderboard:mmlu:security_studies:5 | 0|acc |0.6122|± |0.0312|
183
+ |leaderboard:mmlu:sociology:5 | 0|acc |0.7164|± |0.0319|
184
+ |leaderboard:mmlu:us_foreign_policy:5 | 0|acc |0.8200|± |0.0386|
185
+ |leaderboard:mmlu:virology:5 | 0|acc |0.4578|± |0.0388|
186
+ |leaderboard:mmlu:world_religions:5 | 0|acc |0.7661|± |0.0325|
187
+ |leaderboard:truthfulqa:mc:0 | 0|truthfulqa_mc1|0.2840|± |0.0158|
188
+ | | |truthfulqa_mc2|0.4423|± |0.0146|
189
+ |leaderboard:winogrande:5 | 0|acc |0.7593|± |0.0120|