yentinglin
commited on
Commit
•
b6d5518
1
Parent(s):
3778a15
Update README.md
Browse files
README.md
CHANGED
@@ -117,3 +117,73 @@ If you find Taiwan LLM is useful in your work, please cite it with:
|
|
117 |
# Acknowledgement
|
118 |
|
119 |
Taiwan LLM v2 is conducted in collaboration with [Ubitus K.K.](http://ubitus.net). Ubitus provides valuable compute resources for the project.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
# Acknowledgement
|
118 |
|
119 |
Taiwan LLM v2 is conducted in collaboration with [Ubitus K.K.](http://ubitus.net). Ubitus provides valuable compute resources for the project.
|
120 |
+
|
121 |
+
## Open LLM Leaderboard
|
122 |
+
| Task |Version| Metric |Value | |Stderr|
|
123 |
+
|------------------------------------------------------|------:|--------------|-----:|---|-----:|
|
124 |
+
|leaderboard:arc:challenge:25 | 0|acc |0.5529|± |0.0145|
|
125 |
+
| | |acc_norm |0.5862|± |0.0144|
|
126 |
+
|leaderboard:gsm8k:5 | 0|qem |0.3177|± |0.0128|
|
127 |
+
|leaderboard:hellaswag:10 | 0|acc |0.6307|± |0.0048|
|
128 |
+
| | |acc_norm |0.8327|± |0.0037|
|
129 |
+
|leaderboard:mmlu:_average:5 | |acc |0.5483|± |0.0356|
|
130 |
+
|leaderboard:mmlu:abstract_algebra:5 | 0|acc |0.3400|± |0.0476|
|
131 |
+
|leaderboard:mmlu:anatomy:5 | 0|acc |0.5111|± |0.0432|
|
132 |
+
|leaderboard:mmlu:astronomy:5 | 0|acc |0.5789|± |0.0402|
|
133 |
+
|leaderboard:mmlu:business_ethics:5 | 0|acc |0.5100|± |0.0502|
|
134 |
+
|leaderboard:mmlu:clinical_knowledge:5 | 0|acc |0.6000|± |0.0302|
|
135 |
+
|leaderboard:mmlu:college_biology:5 | 0|acc |0.5764|± |0.0413|
|
136 |
+
|leaderboard:mmlu:college_chemistry:5 | 0|acc |0.4100|± |0.0494|
|
137 |
+
|leaderboard:mmlu:college_computer_science:5 | 0|acc |0.4500|± |0.0500|
|
138 |
+
|leaderboard:mmlu:college_mathematics:5 | 0|acc |0.3800|± |0.0488|
|
139 |
+
|leaderboard:mmlu:college_medicine:5 | 0|acc |0.5434|± |0.0380|
|
140 |
+
|leaderboard:mmlu:college_physics:5 | 0|acc |0.2941|± |0.0453|
|
141 |
+
|leaderboard:mmlu:computer_security:5 | 0|acc |0.7000|± |0.0461|
|
142 |
+
|leaderboard:mmlu:conceptual_physics:5 | 0|acc |0.4468|± |0.0325|
|
143 |
+
|leaderboard:mmlu:econometrics:5 | 0|acc |0.2719|± |0.0419|
|
144 |
+
|leaderboard:mmlu:electrical_engineering:5 | 0|acc |0.4552|± |0.0415|
|
145 |
+
|leaderboard:mmlu:elementary_mathematics:5 | 0|acc |0.3175|± |0.0240|
|
146 |
+
|leaderboard:mmlu:formal_logic:5 | 0|acc |0.3413|± |0.0424|
|
147 |
+
|leaderboard:mmlu:global_facts:5 | 0|acc |0.3700|± |0.0485|
|
148 |
+
|leaderboard:mmlu:high_school_biology:5 | 0|acc |0.6323|± |0.0274|
|
149 |
+
|leaderboard:mmlu:high_school_chemistry:5 | 0|acc |0.4581|± |0.0351|
|
150 |
+
|leaderboard:mmlu:high_school_computer_science:5 | 0|acc |0.5400|± |0.0501|
|
151 |
+
|leaderboard:mmlu:high_school_european_history:5 | 0|acc |0.6364|± |0.0376|
|
152 |
+
|leaderboard:mmlu:high_school_geography:5 | 0|acc |0.6970|± |0.0327|
|
153 |
+
|leaderboard:mmlu:high_school_government_and_politics:5| 0|acc |0.7617|± |0.0307|
|
154 |
+
|leaderboard:mmlu:high_school_macroeconomics:5 | 0|acc |0.4974|± |0.0254|
|
155 |
+
|leaderboard:mmlu:high_school_mathematics:5 | 0|acc |0.3296|± |0.0287|
|
156 |
+
|leaderboard:mmlu:high_school_microeconomics:5 | 0|acc |0.5336|± |0.0324|
|
157 |
+
|leaderboard:mmlu:high_school_physics:5 | 0|acc |0.3709|± |0.0394|
|
158 |
+
|leaderboard:mmlu:high_school_psychology:5 | 0|acc |0.7468|± |0.0186|
|
159 |
+
|leaderboard:mmlu:high_school_statistics:5 | 0|acc |0.4074|± |0.0335|
|
160 |
+
|leaderboard:mmlu:high_school_us_history:5 | 0|acc |0.7108|± |0.0318|
|
161 |
+
|leaderboard:mmlu:high_school_world_history:5 | 0|acc |0.7046|± |0.0297|
|
162 |
+
|leaderboard:mmlu:human_aging:5 | 0|acc |0.6323|± |0.0324|
|
163 |
+
|leaderboard:mmlu:human_sexuality:5 | 0|acc |0.5878|± |0.0432|
|
164 |
+
|leaderboard:mmlu:international_law:5 | 0|acc |0.6694|± |0.0429|
|
165 |
+
|leaderboard:mmlu:jurisprudence:5 | 0|acc |0.7037|± |0.0441|
|
166 |
+
|leaderboard:mmlu:logical_fallacies:5 | 0|acc |0.6564|± |0.0373|
|
167 |
+
|leaderboard:mmlu:machine_learning:5 | 0|acc |0.3393|± |0.0449|
|
168 |
+
|leaderboard:mmlu:management:5 | 0|acc |0.7087|± |0.0450|
|
169 |
+
|leaderboard:mmlu:marketing:5 | 0|acc |0.8333|± |0.0244|
|
170 |
+
|leaderboard:mmlu:medical_genetics:5 | 0|acc |0.5400|± |0.0501|
|
171 |
+
|leaderboard:mmlu:miscellaneous:5 | 0|acc |0.7382|± |0.0157|
|
172 |
+
|leaderboard:mmlu:moral_disputes:5 | 0|acc |0.6127|± |0.0262|
|
173 |
+
|leaderboard:mmlu:moral_scenarios:5 | 0|acc |0.3788|± |0.0162|
|
174 |
+
|leaderboard:mmlu:nutrition:5 | 0|acc |0.6046|± |0.0280|
|
175 |
+
|leaderboard:mmlu:philosophy:5 | 0|acc |0.6270|± |0.0275|
|
176 |
+
|leaderboard:mmlu:prehistory:5 | 0|acc |0.6204|± |0.0270|
|
177 |
+
|leaderboard:mmlu:professional_accounting:5 | 0|acc |0.3582|± |0.0286|
|
178 |
+
|leaderboard:mmlu:professional_law:5 | 0|acc |0.3931|± |0.0125|
|
179 |
+
|leaderboard:mmlu:professional_medicine:5 | 0|acc |0.5184|± |0.0304|
|
180 |
+
|leaderboard:mmlu:professional_psychology:5 | 0|acc |0.5556|± |0.0201|
|
181 |
+
|leaderboard:mmlu:public_relations:5 | 0|acc |0.6818|± |0.0446|
|
182 |
+
|leaderboard:mmlu:security_studies:5 | 0|acc |0.6122|± |0.0312|
|
183 |
+
|leaderboard:mmlu:sociology:5 | 0|acc |0.7164|± |0.0319|
|
184 |
+
|leaderboard:mmlu:us_foreign_policy:5 | 0|acc |0.8200|± |0.0386|
|
185 |
+
|leaderboard:mmlu:virology:5 | 0|acc |0.4578|± |0.0388|
|
186 |
+
|leaderboard:mmlu:world_religions:5 | 0|acc |0.7661|± |0.0325|
|
187 |
+
|leaderboard:truthfulqa:mc:0 | 0|truthfulqa_mc1|0.2840|± |0.0158|
|
188 |
+
| | |truthfulqa_mc2|0.4423|± |0.0146|
|
189 |
+
|leaderboard:winogrande:5 | 0|acc |0.7593|± |0.0120|
|