DavidLanz commited on
Commit
3a38018
·
verified ·
1 Parent(s): ff297d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1409 -3
README.md CHANGED
@@ -1,3 +1,1409 @@
1
- ---
2
- license: llama3.2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ language:
4
+ - zh
5
+ - en
6
+ - it
7
+ - de
8
+ - fr
9
+ - ja
10
+ - ko
11
+ base_model:
12
+ - meta-llama/Llama-3.2-3B
13
+ - lianghsun/Llama-3.2-Taiwan-3B
14
+ datasets:
15
+ - lianghsun/tw-emergency-medicine-bench
16
+ - lianghsun/tw-legal-nlp
17
+ - lianghsun/tw-legal-synthetic-qa
18
+ - lianghsun/tw-law-article-qa
19
+ - lianghsun/tw-judgment-qa
20
+ - lianghsun/tw-judgment-gist-chat
21
+ - lianghsun/tw-bar-examination-2020-chat
22
+ - lianghsun/tw-structured-law-article
23
+ - lianghsun/tw-judgment-gist-chat
24
+ - lianghsun/tw-contract-review-chat
25
+ - lianghsun/reasoning-base-20k-chat
26
+ - lianghsun/vulnerability-mitigation-qa-zh_tw
27
+ - lianghsun/tw-instruct
28
+ - rombodawg/Everything_Instruct_Multilingual
29
+ - xzuyn/manythings-translations-alpaca
30
+ - neural-bridge/rag-dataset-12000
31
+ - minyichen/glaive_toolcall_zh_tw
32
+ pipeline_tag: text-generation
33
+ library_name: transformers
34
+ tags:
35
+ - Taiwan
36
+ - ROC
37
+ - zh-tw
38
+ - instruct
39
+ - chat
40
+ - llama3.2
41
+ - SLM
42
+ model-index:
43
+ - name: Llama-3.2-Taiwan-3B-Instruct
44
+ results:
45
+ - task:
46
+ type: text-generation
47
+ name: Single Choice Question
48
+ dataset:
49
+ type: lianghsun/tw-legal-benchmark-v1
50
+ name: tw-legal-benchmark-v1
51
+ metrics:
52
+ - name: single choice
53
+ type: accuracy
54
+ value: 31.1
55
+ - task:
56
+ type: text-generation
57
+ name: Single Choice Question
58
+ dataset:
59
+ type: lianghsun/Formosa-bench
60
+ name: (Society) Formosa Taiwan Knowledge Bench
61
+ config: society
62
+ split: test
63
+ revision: v2024.11.27
64
+ metrics:
65
+ - name: single choice
66
+ type: accuracy
67
+ value: 60.42
68
+ - task:
69
+ type: text-generation
70
+ name: Single Choice Question
71
+ dataset:
72
+ type: lianghsun/Formosa-bench
73
+ name: (Governmnt) Formosa Taiwan Knowledge Bench
74
+ config: governmnt
75
+ split: test
76
+ revision: v2024.11.27
77
+ metrics:
78
+ - name: single choice
79
+ type: accuracy
80
+ value: 44.25
81
+ - task:
82
+ type: text-generation
83
+ name: Single Choice Question
84
+ dataset:
85
+ type: lianghsun/Formosa-bench
86
+ name: (Geography) Formosa Taiwan Knowledge Bench
87
+ config: geography
88
+ split: test
89
+ revision: v2024.11.27
90
+ metrics:
91
+ - name: single choice
92
+ type: accuracy
93
+ value: 47.54
94
+ - task:
95
+ type: text-generation
96
+ name: Single Choice Question
97
+ dataset:
98
+ type: lianghsun/Formosa-bench
99
+ name: (History) Formosa Taiwan Knowledge Bench
100
+ config: history
101
+ split: test
102
+ revision: v2024.11.27
103
+ metrics:
104
+ - name: single choice
105
+ type: accuracy
106
+ value: 60
107
+ - task:
108
+ type: question-answering
109
+ name: Single Choice Question
110
+ dataset:
111
+ type: ikala/tmmluplus
112
+ name: (geography_of_taiwan) tmmlu++
113
+ config: geography_of_taiwan
114
+ split: test
115
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
116
+ metrics:
117
+ - name: single choice
118
+ type: accuracy
119
+ value: 36.2
120
+ - task:
121
+ type: question-answering
122
+ name: Single Choice Question
123
+ dataset:
124
+ type: ikala/tmmluplus
125
+ name: (dentistry) tmmlu++
126
+ config: dentistry
127
+ split: test
128
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
129
+ metrics:
130
+ - name: single choice
131
+ type: accuracy
132
+ value: 33.83
133
+ - task:
134
+ type: question-answering
135
+ name: Single Choice Question
136
+ dataset:
137
+ type: ikala/tmmluplus
138
+ name: (technical) tmmlu++
139
+ config: technical
140
+ split: test
141
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
142
+ metrics:
143
+ - name: single choice
144
+ type: accuracy
145
+ value: 35.07
146
+ - task:
147
+ type: question-answering
148
+ name: Single Choice Question
149
+ dataset:
150
+ type: ikala/tmmluplus
151
+ name: (statistics_and_machine_learning) tmmlu++
152
+ config: statistics_and_machine_learning
153
+ split: test
154
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
155
+ metrics:
156
+ - name: single choice
157
+ type: accuracy
158
+ value: 28.57
159
+ - task:
160
+ type: question-answering
161
+ name: Single Choice Question
162
+ dataset:
163
+ type: ikala/tmmluplus
164
+ name: (clinical_psychology) tmmlu++
165
+ config: clinical_psychology
166
+ split: test
167
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
168
+ metrics:
169
+ - name: single choice
170
+ type: accuracy
171
+ value: 29.6
172
+ - task:
173
+ type: question-answering
174
+ name: Single Choice Question
175
+ dataset:
176
+ type: ikala/tmmluplus
177
+ name: (tve_design) tmmlu++
178
+ config: tve_design
179
+ split: test
180
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
181
+ metrics:
182
+ - name: single choice
183
+ type: accuracy
184
+ value: 38.54
185
+ - task:
186
+ type: question-answering
187
+ name: Single Choice Question
188
+ dataset:
189
+ type: ikala/tmmluplus
190
+ name: (three_principles_of_people) tmmlu++
191
+ config: three_principles_of_people
192
+ split: test
193
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
194
+ metrics:
195
+ - name: single choice
196
+ type: accuracy
197
+ value: 48.2
198
+ - task:
199
+ type: question-answering
200
+ name: Single Choice Question
201
+ dataset:
202
+ type: ikala/tmmluplus
203
+ name: (introduction_to_law) tmmlu++
204
+ config: introduction_to_law
205
+ split: test
206
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
207
+ metrics:
208
+ - name: single choice
209
+ type: accuracy
210
+ value: 29.96
211
+ - task:
212
+ type: question-answering
213
+ name: Single Choice Question
214
+ dataset:
215
+ type: ikala/tmmluplus
216
+ name: (linear_algebra) tmmlu++
217
+ config: linear_algebra
218
+ split: test
219
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
220
+ metrics:
221
+ - name: single choice
222
+ type: accuracy
223
+ value: 21.43
224
+ - task:
225
+ type: question-answering
226
+ name: Single Choice Question
227
+ dataset:
228
+ type: ikala/tmmluplus
229
+ name: (agriculture) tmmlu++
230
+ config: agriculture
231
+ split: test
232
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
233
+ metrics:
234
+ - name: single choice
235
+ type: accuracy
236
+ value: 24.5
237
+ - task:
238
+ type: question-answering
239
+ name: Single Choice Question
240
+ dataset:
241
+ type: ikala/tmmluplus
242
+ name: (jce_humanities) tmmlu++
243
+ config: jce_humanities
244
+ split: test
245
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
246
+ metrics:
247
+ - name: single choice
248
+ type: accuracy
249
+ value: 38.89
250
+ - task:
251
+ type: question-answering
252
+ name: Single Choice Question
253
+ dataset:
254
+ type: ikala/tmmluplus
255
+ name: (music) tmmlu++
256
+ config: music
257
+ split: test
258
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
259
+ metrics:
260
+ - name: single choice
261
+ type: accuracy
262
+ value: 25.9
263
+ - task:
264
+ type: question-answering
265
+ name: Single Choice Question
266
+ dataset:
267
+ type: ikala/tmmluplus
268
+ name: (secondary_physics) tmmlu++
269
+ config: secondary_physics
270
+ split: test
271
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
272
+ metrics:
273
+ - name: single choice
274
+ type: accuracy
275
+ value: 33.04
276
+ - task:
277
+ type: question-answering
278
+ name: Single Choice Question
279
+ dataset:
280
+ type: ikala/tmmluplus
281
+ name: (physics) tmmlu++
282
+ config: physics
283
+ split: test
284
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
285
+ metrics:
286
+ - name: single choice
287
+ type: accuracy
288
+ value: 27.84
289
+ - task:
290
+ type: question-answering
291
+ name: Single Choice Question
292
+ dataset:
293
+ type: ikala/tmmluplus
294
+ name: (advance_chemistry) tmmlu++
295
+ config: advance_chemistry
296
+ split: test
297
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
298
+ metrics:
299
+ - name: single choice
300
+ type: accuracy
301
+ value: 27.64
302
+ - task:
303
+ type: question-answering
304
+ name: Single Choice Question
305
+ dataset:
306
+ type: ikala/tmmluplus
307
+ name: (junior_science_exam) tmmlu++
308
+ config: junior_science_exam
309
+ split: test
310
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
311
+ metrics:
312
+ - name: single choice
313
+ type: accuracy
314
+ value: 30.05
315
+ - task:
316
+ type: question-answering
317
+ name: Single Choice Question
318
+ dataset:
319
+ type: ikala/tmmluplus
320
+ name: (veterinary_pathology) tmmlu++
321
+ config: veterinary_pathology
322
+ split: test
323
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
324
+ metrics:
325
+ - name: single choice
326
+ type: accuracy
327
+ value: 25.09
328
+ - task:
329
+ type: question-answering
330
+ name: Single Choice Question
331
+ dataset:
332
+ type: ikala/tmmluplus
333
+ name: (financial_analysis) tmmlu++
334
+ config: financial_analysis
335
+ split: test
336
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
337
+ metrics:
338
+ - name: single choice
339
+ type: accuracy
340
+ value: 25.13
341
+ - task:
342
+ type: question-answering
343
+ name: Single Choice Question
344
+ dataset:
345
+ type: ikala/tmmluplus
346
+ name: (national_protection) tmmlu++
347
+ config: national_protection
348
+ split: test
349
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
350
+ metrics:
351
+ - name: single choice
352
+ type: accuracy
353
+ value: 42.65
354
+ - task:
355
+ type: question-answering
356
+ name: Single Choice Question
357
+ dataset:
358
+ type: ikala/tmmluplus
359
+ name: (macroeconomics) tmmlu++
360
+ config: macroeconomics
361
+ split: test
362
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
363
+ metrics:
364
+ - name: single choice
365
+ type: accuracy
366
+ value: 26.76
367
+ - task:
368
+ type: question-answering
369
+ name: Single Choice Question
370
+ dataset:
371
+ type: ikala/tmmluplus
372
+ name: (politic_science) tmmlu++
373
+ config: politic_science
374
+ split: test
375
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
376
+ metrics:
377
+ - name: single choice
378
+ type: accuracy
379
+ value: 27.44
380
+ - task:
381
+ type: question-answering
382
+ name: Single Choice Question
383
+ dataset:
384
+ type: ikala/tmmluplus
385
+ name: (ttqav2) tmmlu++
386
+ config: ttqav2
387
+ split: test
388
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
389
+ metrics:
390
+ - name: single choice
391
+ type: accuracy
392
+ value: 61.06
393
+ - task:
394
+ type: question-answering
395
+ name: Single Choice Question
396
+ dataset:
397
+ type: ikala/tmmluplus
398
+ name: (junior_chinese_exam) tmmlu++
399
+ config: junior_chinese_exam
400
+ split: test
401
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
402
+ metrics:
403
+ - name: single choice
404
+ type: accuracy
405
+ value: 30.86
406
+ - task:
407
+ type: question-answering
408
+ name: Single Choice Question
409
+ dataset:
410
+ type: ikala/tmmluplus
411
+ name: (traditional_chinese_medicine_clinical_medicine) tmmlu++
412
+ config: traditional_chinese_medicine_clinical_medicine
413
+ split: test
414
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
415
+ metrics:
416
+ - name: single choice
417
+ type: accuracy
418
+ value: 25.9
419
+ - task:
420
+ type: question-answering
421
+ name: Single Choice Question
422
+ dataset:
423
+ type: ikala/tmmluplus
424
+ name: (junior_math_exam) tmmlu++
425
+ config: junior_math_exam
426
+ split: test
427
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
428
+ metrics:
429
+ - name: single choice
430
+ type: accuracy
431
+ value: 21.71
432
+ - task:
433
+ type: question-answering
434
+ name: Single Choice Question
435
+ dataset:
436
+ type: ikala/tmmluplus
437
+ name: (auditing) tmmlu++
438
+ config: auditing
439
+ split: test
440
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
441
+ metrics:
442
+ - name: single choice
443
+ type: accuracy
444
+ value: 21.82
445
+ - task:
446
+ type: question-answering
447
+ name: Single Choice Question
448
+ dataset:
449
+ type: ikala/tmmluplus
450
+ name: (anti_money_laundering) tmmlu++
451
+ config: anti_money_laundering
452
+ split: test
453
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
454
+ metrics:
455
+ - name: single choice
456
+ type: accuracy
457
+ value: 37.31
458
+ - task:
459
+ type: question-answering
460
+ name: Single Choice Question
461
+ dataset:
462
+ type: ikala/tmmluplus
463
+ name: (pharmacology) tmmlu++
464
+ config: pharmacology
465
+ split: test
466
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
467
+ metrics:
468
+ - name: single choice
469
+ type: accuracy
470
+ value: 30.68
471
+ - task:
472
+ type: question-answering
473
+ name: Single Choice Question
474
+ dataset:
475
+ type: ikala/tmmluplus
476
+ name: (trust_practice) tmmlu++
477
+ config: trust_practice
478
+ split: test
479
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
480
+ metrics:
481
+ - name: single choice
482
+ type: accuracy
483
+ value: 28.18
484
+ - task:
485
+ type: question-answering
486
+ name: Single Choice Question
487
+ dataset:
488
+ type: ikala/tmmluplus
489
+ name: (tve_mathematics) tmmlu++
490
+ config: tve_mathematics
491
+ split: test
492
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
493
+ metrics:
494
+ - name: single choice
495
+ type: accuracy
496
+ value: 18.67
497
+ - task:
498
+ type: question-answering
499
+ name: Single Choice Question
500
+ dataset:
501
+ type: ikala/tmmluplus
502
+ name: (human_behavior) tmmlu++
503
+ config: human_behavior
504
+ split: test
505
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
506
+ metrics:
507
+ - name: single choice
508
+ type: accuracy
509
+ value: 32.04
510
+ - task:
511
+ type: question-answering
512
+ name: Single Choice Question
513
+ dataset:
514
+ type: ikala/tmmluplus
515
+ name: (pharmacy) tmmlu++
516
+ config: pharmacy
517
+ split: test
518
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
519
+ metrics:
520
+ - name: single choice
521
+ type: accuracy
522
+ value: 22.76
523
+ - task:
524
+ type: question-answering
525
+ name: Single Choice Question
526
+ dataset:
527
+ type: ikala/tmmluplus
528
+ name: (tve_chinese_language) tmmlu++
529
+ config: tve_chinese_language
530
+ split: test
531
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
532
+ metrics:
533
+ - name: single choice
534
+ type: accuracy
535
+ value: 36.65
536
+ - task:
537
+ type: question-answering
538
+ name: Single Choice Question
539
+ dataset:
540
+ type: ikala/tmmluplus
541
+ name: (optometry) tmmlu++
542
+ config: optometry
543
+ split: test
544
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
545
+ metrics:
546
+ - name: single choice
547
+ type: accuracy
548
+ value: 25.11
549
+ - task:
550
+ type: question-answering
551
+ name: Single Choice Question
552
+ dataset:
553
+ type: ikala/tmmluplus
554
+ name: (physical_education) tmmlu++
555
+ config: physical_education
556
+ split: test
557
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
558
+ metrics:
559
+ - name: single choice
560
+ type: accuracy
561
+ value: 30.73
562
+ - task:
563
+ type: question-answering
564
+ name: Single Choice Question
565
+ dataset:
566
+ type: ikala/tmmluplus
567
+ name: (organic_chemistry) tmmlu++
568
+ config: organic_chemistry
569
+ split: test
570
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
571
+ metrics:
572
+ - name: single choice
573
+ type: accuracy
574
+ value: 35.78
575
+ - task:
576
+ type: question-answering
577
+ name: Single Choice Question
578
+ dataset:
579
+ type: ikala/tmmluplus
580
+ name: (tve_natural_sciences) tmmlu++
581
+ config: tve_natural_sciences
582
+ split: test
583
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
584
+ metrics:
585
+ - name: single choice
586
+ type: accuracy
587
+ value: 33.73
588
+ - task:
589
+ type: question-answering
590
+ name: Single Choice Question
591
+ dataset:
592
+ type: ikala/tmmluplus
593
+ name: (education) tmmlu++
594
+ config: education
595
+ split: test
596
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
597
+ metrics:
598
+ - name: single choice
599
+ type: accuracy
600
+ value: 37.9
601
+ - task:
602
+ type: question-answering
603
+ name: Single Choice Question
604
+ dataset:
605
+ type: ikala/tmmluplus
606
+ name: (mechanical) tmmlu++
607
+ config: mechanical
608
+ split: test
609
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
610
+ metrics:
611
+ - name: single choice
612
+ type: accuracy
613
+ value: 42.37
614
+ - task:
615
+ type: question-answering
616
+ name: Single Choice Question
617
+ dataset:
618
+ type: ikala/tmmluplus
619
+ name: (taiwanese_hokkien) tmmlu++
620
+ config: taiwanese_hokkien
621
+ split: test
622
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
623
+ metrics:
624
+ - name: single choice
625
+ type: accuracy
626
+ value: 14.73
627
+ - task:
628
+ type: question-answering
629
+ name: Single Choice Question
630
+ dataset:
631
+ type: ikala/tmmluplus
632
+ name: (nautical_science) tmmlu++
633
+ config: nautical_science
634
+ split: test
635
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
636
+ metrics:
637
+ - name: single choice
638
+ type: accuracy
639
+ value: 30.49
640
+ - task:
641
+ type: question-answering
642
+ name: Single Choice Question
643
+ dataset:
644
+ type: ikala/tmmluplus
645
+ name: (business_management) tmmlu++
646
+ config: business_management
647
+ split: test
648
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
649
+ metrics:
650
+ - name: single choice
651
+ type: accuracy
652
+ value: 39.57
653
+ - task:
654
+ type: question-answering
655
+ name: Single Choice Question
656
+ dataset:
657
+ type: ikala/tmmluplus
658
+ name: (logic_reasoning) tmmlu++
659
+ config: logic_reasoning
660
+ split: test
661
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
662
+ metrics:
663
+ - name: single choice
664
+ type: accuracy
665
+ value: 27.34
666
+ - task:
667
+ type: question-answering
668
+ name: Single Choice Question
669
+ dataset:
670
+ type: ikala/tmmluplus
671
+ name: (marketing_management) tmmlu++
672
+ config: marketing_management
673
+ split: test
674
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
675
+ metrics:
676
+ - name: single choice
677
+ type: accuracy
678
+ value: 39.78
679
+ - task:
680
+ type: question-answering
681
+ name: Single Choice Question
682
+ dataset:
683
+ type: ikala/tmmluplus
684
+ name: (economics) tmmlu++
685
+ config: economics
686
+ split: test
687
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
688
+ metrics:
689
+ - name: single choice
690
+ type: accuracy
691
+ value: 25.95
692
+ - task:
693
+ type: question-answering
694
+ name: Single Choice Question
695
+ dataset:
696
+ type: ikala/tmmluplus
697
+ name: (basic_medical_science) tmmlu++
698
+ config: basic_medical_science
699
+ split: test
700
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
701
+ metrics:
702
+ - name: single choice
703
+ type: accuracy
704
+ value: 28.41
705
+ - task:
706
+ type: question-answering
707
+ name: Single Choice Question
708
+ dataset:
709
+ type: ikala/tmmluplus
710
+ name: (occupational_therapy_for_psychological_disorders) tmmlu++
711
+ config: occupational_therapy_for_psychological_disorders
712
+ split: test
713
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
714
+ metrics:
715
+ - name: single choice
716
+ type: accuracy
717
+ value: 35.73
718
+ - task:
719
+ type: question-answering
720
+ name: Single Choice Question
721
+ dataset:
722
+ type: ikala/tmmluplus
723
+ name: (general_principles_of_law) tmmlu++
724
+ config: general_principles_of_law
725
+ split: test
726
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
727
+ metrics:
728
+ - name: single choice
729
+ type: accuracy
730
+ value: 31.13
731
+ - task:
732
+ type: question-answering
733
+ name: Single Choice Question
734
+ dataset:
735
+ type: ikala/tmmluplus
736
+ name: (junior_chemistry) tmmlu++
737
+ config: junior_chemistry
738
+ split: test
739
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
740
+ metrics:
741
+ - name: single choice
742
+ type: accuracy
743
+ value: 24.88
744
+ - task:
745
+ type: question-answering
746
+ name: Single Choice Question
747
+ dataset:
748
+ type: ikala/tmmluplus
749
+ name: (veterinary_pharmacology) tmmlu++
750
+ config: veterinary_pharmacology
751
+ split: test
752
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
753
+ metrics:
754
+ - name: single choice
755
+ type: accuracy
756
+ value: 36.3
757
+ - task:
758
+ type: question-answering
759
+ name: Single Choice Question
760
+ dataset:
761
+ type: ikala/tmmluplus
762
+ name: (educational_psychology) tmmlu++
763
+ config: educational_psychology
764
+ split: test
765
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
766
+ metrics:
767
+ - name: single choice
768
+ type: accuracy
769
+ value: 33.52
770
+ - task:
771
+ type: question-answering
772
+ name: Single Choice Question
773
+ dataset:
774
+ type: ikala/tmmluplus
775
+ name: (finance_banking) tmmlu++
776
+ config: finance_banking
777
+ split: test
778
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
779
+ metrics:
780
+ - name: single choice
781
+ type: accuracy
782
+ value: 32.59
783
+ - task:
784
+ type: question-answering
785
+ name: Single Choice Question
786
+ dataset:
787
+ type: ikala/tmmluplus
788
+ name: (official_document_management) tmmlu++
789
+ config: official_document_management
790
+ split: test
791
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
792
+ metrics:
793
+ - name: single choice
794
+ type: accuracy
795
+ value: 32.43
796
+ - task:
797
+ type: question-answering
798
+ name: Single Choice Question
799
+ dataset:
800
+ type: ikala/tmmluplus
801
+ name: (fire_science) tmmlu++
802
+ config: fire_science
803
+ split: test
804
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
805
+ metrics:
806
+ - name: single choice
807
+ type: accuracy
808
+ value: 30.65
809
+ - task:
810
+ type: question-answering
811
+ name: Single Choice Question
812
+ dataset:
813
+ type: ikala/tmmluplus
814
+ name: (junior_social_studies) tmmlu++
815
+ config: junior_social_studies
816
+ split: test
817
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
818
+ metrics:
819
+ - name: single choice
820
+ type: accuracy
821
+ value: 47.62
822
+ - task:
823
+ type: question-answering
824
+ name: Single Choice Question
825
+ dataset:
826
+ type: ikala/tmmluplus
827
+ name: (accounting) tmmlu++
828
+ config: accounting
829
+ split: test
830
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
831
+ metrics:
832
+ - name: single choice
833
+ type: accuracy
834
+ value: 20.94
835
+ - task:
836
+ type: question-answering
837
+ name: Single Choice Question
838
+ dataset:
839
+ type: ikala/tmmluplus
840
+ name: (engineering_math) tmmlu++
841
+ config: engineering_math
842
+ split: test
843
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
844
+ metrics:
845
+ - name: single choice
846
+ type: accuracy
847
+ value: 27.18
848
+ - task:
849
+ type: question-answering
850
+ name: Single Choice Question
851
+ dataset:
852
+ type: ikala/tmmluplus
853
+ name: (education_(profession_level)) tmmlu++
854
+ config: education_(profession_level)
855
+ split: test
856
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
857
+ metrics:
858
+ - name: single choice
859
+ type: accuracy
860
+ value: 24.07
861
+ - task:
862
+ type: question-answering
863
+ name: Single Choice Question
864
+ dataset:
865
+ type: ikala/tmmluplus
866
+ name: (chinese_language_and_literature) tmmlu++
867
+ config: chinese_language_and_literature
868
+ split: test
869
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
870
+ metrics:
871
+ - name: single choice
872
+ type: accuracy
873
+ value: 27.64
874
+ - task:
875
+ type: question-answering
876
+ name: Single Choice Question
877
+ dataset:
878
+ type: ikala/tmmluplus
879
+ name: (management_accounting) tmmlu++
880
+ config: management_accounting
881
+ split: test
882
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
883
+ metrics:
884
+ - name: single choice
885
+ type: accuracy
886
+ value: 24.19
887
+ - task:
888
+ type: question-answering
889
+ name: Single Choice Question
890
+ dataset:
891
+ type: ikala/tmmluplus
892
+ name: (culinary_skills) tmmlu++
893
+ config: culinary_skills
894
+ split: test
895
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
896
+ metrics:
897
+ - name: single choice
898
+ type: accuracy
899
+ value: 39.38
900
+ - task:
901
+ type: question-answering
902
+ name: Single Choice Question
903
+ dataset:
904
+ type: ikala/tmmluplus
905
+ name: (administrative_law) tmmlu++
906
+ config: administrative_law
907
+ split: test
908
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
909
+ metrics:
910
+ - name: single choice
911
+ type: accuracy
912
+ value: 25.71
913
+ - task:
914
+ type: question-answering
915
+ name: Single Choice Question
916
+ dataset:
917
+ type: ikala/tmmluplus
918
+ name: (insurance_studies) tmmlu++
919
+ config: insurance_studies
920
+ split: test
921
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
922
+ metrics:
923
+ - name: single choice
924
+ type: accuracy
925
+ value: 33.42
926
+ - task:
927
+ type: question-answering
928
+ name: Single Choice Question
929
+ dataset:
930
+ type: ikala/tmmluplus
931
+ name: (real_estate) tmmlu++
932
+ config: real_estate
933
+ split: test
934
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
935
+ metrics:
936
+ - name: single choice
937
+ type: accuracy
938
+ value: 22.83
939
+ - task:
940
+ type: question-answering
941
+ name: Single Choice Question
942
+ dataset:
943
+ type: ikala/tmmluplus
944
+ name: (computer_science) tmmlu++
945
+ config: computer_science
946
+ split: test
947
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
948
+ metrics:
949
+ - name: single choice
950
+ type: accuracy
951
+ value: 31.61
952
+ - task:
953
+ type: question-answering
954
+ name: Single Choice Question
955
+ dataset:
956
+ type: ikala/tmmluplus
957
+ name: (taxation) tmmlu++
958
+ config: taxation
959
+ split: test
960
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
961
+ metrics:
962
+ - name: single choice
963
+ type: accuracy
964
+ value: 27.47
965
+ - task:
966
+ type: question-answering
967
+ name: Single Choice Question
968
+ dataset:
969
+ type: ikala/tmmluplus
970
+ name: (trade) tmmlu++
971
+ config: trade
972
+ split: test
973
+ revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
974
+ metrics:
975
+ - name: single choice
976
+ type: accuracy
977
+ value: 20.32
978
+ widget:
979
+ - text: 中華民國憲法第一條
980
+ metrics:
981
+ - accuracy
982
+ ---
983
+
984
+ # Model Card for lianghsun/Llama-3.2-Taiwan-3B-Instruct
985
+
986
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/618dc56cbc345ca7bf95f3cd/v_cfMxTtVE6_eh0rzcy5L.png)
987
+ *圖像生成來自 [OpenArt](https://openart.ai/home):An anime-style 🦙 standing proudly atop the summit of Taiwan’s [Yushan (Jade Mountain)](https://zh.wikipedia.org/wiki/%E7%8E%89%E5%B1%B1), gazing forward.*
988
+
989
+ 採用 [lianghsun/Llama-3.2-Taiwan-3B](https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B) 為[基礎模型(foundation model)](https://en.wikipedia.org/wiki/Foundation_model),使用大量[中華民國台灣](https://zh.wikipedia.org/zh-tw/%E8%87%BA%E7%81%A3)的繁體中文對話集和多國語言對話集進行模型[指令微調(instruction fine-tuning)](https://www.ibm.com/topics/instruction-tuning)和多輪迭代[直接偏好優化(direct preference optimization, DPO)](https://arxiv.org/abs/2305.18290),旨在訓練出具有中華民國台灣知識及風格的[小語言模型(small langugae model, SLM)](https://www.ibm.com/think/topics/small-language-models)之對話模型。
990
+
991
+ <details>
992
+ <summary><b>Model Change Log</b></summary>
993
+
994
+ | Update Date | Model Version | Key Changes |
995
+ |--------------|-----------------------|-------------------------------------|
996
+ | 2025/01/01 | v2025.01.01 | Fine-tuning is based on the [foundation model](https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B) version v2024.12.28, and it uses self-prepared instruction datasets for this round of fine-tuning. |
997
+ | 2024/12/13 | v2024.12.13 | Completed 1st round DPO training (10/10 epochs). Preparing for next round DPO training. |
998
+ | 2024/11/27 | v2024.11.27 | Completed SFT training (5/5 epochs). Preparing for multi-round DPO training. |
999
+ | 2024/11/25 | v2024.11.25 | Updated model version to v2024.11.25, training progressed to (3/5) epochs. Still in SFT stage, DPO training remains pending. |
1000
+ | 2024/11/22 | v2024.11.22 | Initial upload: Model version v2024.11.22, training completed up to (1/5) epochs. Currently trained only on SFT, DPO training not yet performed. |
1001
+
1002
+ </details>
1003
+
1004
+ ## Model Details
1005
+
1006
+ ### Model Description
1007
+
1008
+ <!-- Provide a longer summary of what this model is. -->
1009
+ - **Developed by:** [Huang Liang Hsun](https://www.linkedin.com/in/lianghsunhuang)
1010
+ - **Model type:** LlamaForCausalLM
1011
+ - **Language(s) (NLP):** Tranditional Chinese (zh-tw), English
1012
+ - **License:** [llama3.2](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)
1013
+ - **Fine-tuned from model:** [lianghsun/Llama-3.2-Taiwan-3B](https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B)
1014
+
1015
+ ### Model Sources
1016
+
1017
+ <!-- Provide the basic links for the model. -->
1018
+ - **Repository:** [lianghsun/Llama-3.2-Taiwan-3B](https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B)
1019
+ - **Paper:** (WIP, show me the time)
1020
+ - **Playground:** [🦉 Tawian SmolLM Chat](https://huggingface.co/spaces/lianghsun/tw-smol-chat) 👈🏼 來玩看看 😻
1021
+ - **Demo:**
1022
+ ```yaml
1023
+ user: 請介紹台灣
1024
+ assistant: 台灣,位於亞洲東部,地處太平洋與菲律賓海之間,面積約36,000平方公里,人口約2,300萬,是民主自由的國家,經濟實力強勁,擁有世界第10大經濟體。台灣以美食、文化、自然美景著稱,還有豐富的歷史與傳統,吸引全球遊客。台灣語為官方語言,但中文也廣為使用,英語也常用於國際交流。台灣政治多元,執政黨為民進黨,台灣是全球科技產業的重鎮,擁有先進的製造業與服務業。台灣氣候溫暖潮濕,四季分明,夏季炎熱,冬季涼爽,雨季則在5月至10月。台灣的美食以小吃為主,如滷肉飯、珍珠
1025
+ ```
1026
+
1027
+ ## Uses
1028
+
1029
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
1030
+
1031
+ ### Direct Use
1032
+
1033
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
1034
+ 本模型已經具備有繁體中文對話能力,使用者可以直接部署推論端點使用。
1035
+
1036
+ ### Downstream Use
1037
+
1038
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
1039
+ 若需強化模型在特定領域的知識,可透過微調進一步提升其性能與專業能力。
1040
+
1041
+ ### Out-of-Scope Use
1042
+
1043
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
1044
+ 本模型旨在提供資訊,不參與任何政治或法律問題的評斷或立場表達。
1045
+
1046
+ ## Bias, Risks, and Limitations
1047
+
1048
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
1049
+ 語言模型的生成內容可能因訓練集的多樣性而帶有偏見、特定立場,或包含與事實不符的言論,請使用者務必在使用過程中仔細確認內容的準確性與中立性。
1050
+
1051
+ ### Recommendations
1052
+
1053
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
1054
+
1055
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
1056
+
1057
+ ## How to Get Started with the Model
1058
+
1059
+ 要使用 [vLLM Docker image](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html) 來啟動此模型,您可以按照以下操作:
1060
+ ```bash
1061
+ docker run --runtime nvidia --gpus all \
1062
+ -v ~/.cache/huggingface:/root/.cache/huggingface \
1063
+ --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
1064
+ -p 8000:8000 \
1065
+ --ipc=host \
1066
+ vllm/vllm-openai:latest \
1067
+ --model lianghsun/Llama-3.2-Taiwan-3B-Instruct
1068
+ ```
1069
+
1070
+ 請注意,如果想要使用不同版本的 checkpoint,請加上 `--revision <tag_name>`
1071
+ ```bash
1072
+ docker run --runtime nvidia --gpus all \
1073
+ -v ~/.cache/huggingface:/root/.cache/huggingface \
1074
+ --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
1075
+ -p 8000:8000 \
1076
+ --ipc=host \
1077
+ vllm/vllm-openai:latest \
1078
+ --model lianghsun/Llama-3.2-Taiwan-3B-Instruct --revision <tag_name>
1079
+ ```
1080
+
1081
+ ## Training Details
1082
+
1083
+ ### Training Data
1084
+
1085
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
1086
+
1087
+ <details>
1088
+ <summary><b>繁體中文對話資料集</b></summary>
1089
+
1090
+ - [lianghsun/tw-legal-nlp](https://huggingface.co/datasets/lianghsun/tw-legal-nlp)
1091
+ - [lianghsun/tw-legal-synthetic-qa](https://huggingface.co/datasets/lianghsun/tw-legal-synthetic-qa)
1092
+ - [lianghsun/tw-law-article-qa](https://huggingface.co/datasets/lianghsun/tw-law-article-qa)
1093
+ - [lianghsun/tw-judgment-qa](https://huggingface.co/datasets/lianghsun/tw-judgment-qa)
1094
+ - [lianghsun/tw-bar-examination-2020-chat](https://huggingface.co/datasets/lianghsun/tw-bar-examination-2020-chat)
1095
+ - [lianghsun/tw-structured-law-article](https://huggingface.co/datasets/lianghsun/tw-structured-law-article)
1096
+ - [lianghsun/tw-judgment-gist-chat](https://huggingface.co/datasets/lianghsun/tw-judgment-gist-chat)
1097
+ - [lianghsun/vulnerability-mitigation-qa-zh_tw](https://huggingface.co/datasets/lianghsun/vulnerability-mitigation-qa-zh_tw)
1098
+ - [lianghsun/tw-legal-qa-chat](https://huggingface.co/datasets/lianghsun/tw-legal-qa-chat)
1099
+ - [lianghsun/reasoning-base-20k-chat](https://huggingface.co/datasets/lianghsun/reasoning-base-20k-chat)
1100
+ - [lianghsun/tw-contract-review-chat](https://huggingface.co/datasets/lianghsun/tw-contract-review-chat)
1101
+ - [lianghsun/tw-legal-methodology-chat](https://huggingface.co/datasets/lianghsun/tw-legal-methodology-chat)
1102
+ - [minyichen/glaive_toolcall_zh_tw](https://huggingface.co/datasets/minyichen/glaive_toolcall_zh_tw)
1103
+
1104
+ </details>
1105
+
1106
+ <details>
1107
+ <summary><b>多國語系對話資料集</b></summary>
1108
+
1109
+ - [rombodawg/Everything_Instruct_Multilingual](https://huggingface.co/datasets/rombodawg/Everything_Instruct_Multilingual)
1110
+ - [xzuyn/manythings-translations-alpaca](https://huggingface.co/datasets/xzuyn/manythings-translations-alpaca)
1111
+ - [neural-bridge/rag-dataset-12000](https://huggingface.co/datasets/neural-bridge/rag-dataset-12000)
1112
+
1113
+ </details>
1114
+
1115
+ ### Training Procedure
1116
+
1117
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
1118
+
1119
+ #### Preprocessing
1120
+
1121
+ (WIP)
1122
+
1123
+ #### Training Hyperparameters
1124
+
1125
+ <details>
1126
+ <summary><b>SFT stage for v2024.11.27</b></summary>
1127
+
1128
+ **Note:** 以下包含 `v2024.11.22` 和 `v2025.11.25` 的超參數設定
1129
+ - **learning_rate:** 5e-05
1130
+ - **min_learning_rate:** 5e-07
1131
+ - **train_batch_size:** 105
1132
+ - **seed:** 42
1133
+ - **distributed_type:** multi-GPU
1134
+ - **num_devices:** 4
1135
+ - **gradient_accumulation_steps:** 50
1136
+ - **total_train_batch_size:** 21,000
1137
+ - **optimizer:** Adam with betas=(0.9,0.999) and epsilon=1e-08
1138
+ - **lr_scheduler_type:** cosine
1139
+ - **lr_scheduler_warmup_ratio:** 0.01
1140
+ - **num_epochs:** 5.0
1141
+ - **global_step:** 590
1142
+ </details>
1143
+
1144
+ #### Speeds, Sizes, Times
1145
+
1146
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
1147
+ <details>
1148
+ <summary><b>SFT stage for v2024.11.27</b></summary>
1149
+
1150
+ **Note:** 以下包含 `v2024.11.22` 和 `v2025.11.25` 的超參數設定
1151
+ - **Duration**: 5 days, 16:15:11.17
1152
+ - **Train runtime**: 490,511.1789
1153
+ - **Train samples per second**: 25.37
1154
+ - **Train steps per second**: 0.001
1155
+ - **Total training FLOPs**: 26,658,386,120,540,160
1156
+ - **Train loss**: 0.8533
1157
+ </details>
1158
+
1159
+ ## Evaluation
1160
+
1161
+ <!-- This section describes the evaluation protocols and provides the results. -->
1162
+
1163
+ ### Testing Data, Factors & Metrics
1164
+
1165
+ <details>
1166
+ <summary><b>Formosa Taiwan Knowledge Bench</b></summary>
1167
+
1168
+ #### Testing Data
1169
+
1170
+ <!-- This should link to a Dataset Card if possible. -->
1171
+
1172
+ [lianghsun/Formosa-bench](https://huggingface.co/datasets/lianghsun/Formosa-bench)
1173
+
1174
+ #### Factors
1175
+
1176
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
1177
+
1178
+ [More Information Needed]
1179
+
1180
+ #### Metrics
1181
+
1182
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
1183
+
1184
+ [More Information Needed]
1185
+
1186
+ ### Results
1187
+
1188
+ [More Information Needed]
1189
+
1190
+ #### Summary
1191
+
1192
+ </details>
1193
+
1194
+ <details>
1195
+ <summary><b>lianghsun/tw-legal-benchmark-v1</b></summary>
1196
+
1197
+ #### Testing Data
1198
+
1199
+ <!-- This should link to a Dataset Card if possible. -->
1200
+
1201
+ - **Dataset:** [lianghsun/tw-legal-benchmark-v1](https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1)
1202
+ - **Revision:** 66c3a5f3ff2298f6a1cf23201070b5317bdd1893
1203
+
1204
+ #### Factors
1205
+
1206
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
1207
+
1208
+ [More Information Needed]
1209
+
1210
+ #### Metrics
1211
+
1212
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
1213
+ Accuracy
1214
+
1215
+ ### Results
1216
+
1217
+ - **Model Revision:** v2024.11.27
1218
+
1219
+ | **Subset** | **Split** | **Score** |
1220
+ |--------------|-------|-------|
1221
+ | [lianghsun/tw-legal-benchmark-v1](https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1/blob/main/benchmark.csv) | train | 31.1 |
1222
+
1223
+ #### Summary
1224
+
1225
+ </details>
1226
+
1227
+ <details>
1228
+ <summary><b>tmmlu++</b></summary>
1229
+
1230
+ #### Testing Data
1231
+
1232
+ <!-- This should link to a Dataset Card if possible. -->
1233
+ - **Dataset:** [ikala/tmmluplus](https://huggingface.co/datasets/ikala/tmmluplus)
1234
+ - **Revision:** c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
1235
+
1236
+ #### Factors
1237
+
1238
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
1239
+ [More Information Needed]
1240
+
1241
+ #### Metrics
1242
+
1243
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
1244
+ Accuracy
1245
+
1246
+ ### Results
1247
+
1248
+ - **Model Revision:** v2024.11.27
1249
+
1250
+ | **Subset** | **Split** | **Score** |
1251
+ |--------------|-------|-------|
1252
+ | [geography_of_taiwan](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/geography_of_taiwan_test.csv) | test | 36.2 |
1253
+ | [dentistry](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/dentistry_test.csv) | test | 33.83 |
1254
+ | [technical](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/technical_test.csv) | test | 35.07 |
1255
+ | [statistics_and_machine_learning](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/statistics_and_machine_learning_test.csv) | test | 28.57 |
1256
+ | [clinical_psychology](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/clinical_psychology_test.csv) | test | 29.6 |
1257
+ | [tve_design](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/tve_design_test.csv) | test | 38.54 |
1258
+ | [three_principles_of_people](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/three_principles_of_people_test.csv) | test | 48.2 |
1259
+ | [introduction_to_law](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/introduction_to_law_test.csv) | test | 29.96 |
1260
+ | [linear_algebra](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/linear_algebra_test.csv) | test | 21.43 |
1261
+ | [agriculture](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/agriculture_test.csv) | test | 24.5 |
1262
+ | [jce_humanities](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/jce_humanities_test.csv) | test | 38.89 |
1263
+ | [music](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/music_test.csv) | test | 25.9 |
1264
+ | [secondary_physics](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/secondary_physics_test.csv) | test | 33.04 |
1265
+ | [physics](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/physics_test.csv) | test | 27.84 |
1266
+ | [advance_chemistry](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/advance_chemistry_test.csv) | test | 27.64 |
1267
+ | [junior_science_exam](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/junior_science_exam_test.csv) | test | 30.05 |
1268
+ | [veterinary_pathology](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/veterinary_pathology_test.csv) | test | 25.09 |
1269
+ | [financial_analysis](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/financial_analysis_test.csv) | test | 25.13 |
1270
+ | [national_protection](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/national_protection_test.csv) | test | 42.65 |
1271
+ | [macroeconomics](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/macroeconomics_test.csv) | test | 26.76 |
1272
+ | [politic_science](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/politic_science_test.csv) | test | 27.44 |
1273
+ | [ttqav2](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/ttqav2_test.csv) | test | 61.06 |
1274
+ | [junior_chinese_exam](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/junior_chinese_exam_test.csv) | test | 30.86 |
1275
+ | [traditional_chinese_medicine_clinical_medicine](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/traditional_chinese_medicine_clinical_medicine_test.csv) | test | 25.9 |
1276
+ | [junior_math_exam](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/junior_math_exam_test.csv) | test | 21.71 |
1277
+ | [auditing](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/auditing_test.csv) | test | 21.82 |
1278
+ | [anti_money_laundering](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/anti_money_laundering_test.csv) | test | 37.31 |
1279
+ | [pharmacology](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/pharmacology_test.csv) | test | 30.68 |
1280
+ | [trust_practice](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/trust_practice_test.csv) | test | 28.18 |
1281
+ | [tve_mathematics](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/tve_mathematics_test.csv) | test | 18.67 |
1282
+ | [human_behavior](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/human_behavior_test.csv) | test | 32.04 |
1283
+ | [pharmacy](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/pharmacy_test.csv) | test | 22.76 |
1284
+ | [tve_chinese_language](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/tve_chinese_language_test.csv) | test | 36.65 |
1285
+ | [optometry](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/optometry_test.csv) | test | 25.11 |
1286
+ | [physical_education](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/physical_education_test.csv) | test | 30.73 |
1287
+ | [organic_chemistry](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/organic_chemistry_test.csv) | test | 35.78 |
1288
+ | [tve_natural_sciences](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/tve_natural_sciences_test.csv) | test | 33.73 |
1289
+ | [education](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/education_test.csv) | test | 37.9 |
1290
+ | [mechanical](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/mechanical_test.csv) | test | 42.37 |
1291
+ | [taiwanese_hokkien](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/taiwanese_hokkien_test.csv) | test | 14.73 |
1292
+ | [nautical_science](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/nautical_science_test.csv) | test | 30.49 |
1293
+ | [business_management](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/business_management_test.csv) | test | 39.57 |
1294
+ | [logic_reasoning](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/logic_reasoning_test.csv) | test | 27.34 |
1295
+ | [marketing_management](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/marketing_management_test.csv) | test | 39.78 |
1296
+ | [economics](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/economics_test.csv) | test | 25.95 |
1297
+ | [basic_medical_science](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/basic_medical_science_test.csv) | test | 28.41 |
1298
+ | [occupational_therapy_for_psychological_disorders](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/occupational_therapy_for_psychological_disorders_test.csv) | test | 35.73 |
1299
+ | [general_principles_of_law](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/general_principles_of_law_test.csv) | test | 31.13 |
1300
+ | [junior_chemistry](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/junior_chemistry_test.csv) | test | 24.88 |
1301
+ | [veterinary_pharmacology](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/veterinary_pharmacology_test.csv) | test | 36.3 |
1302
+ | [educational_psychology](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/educational_psychology_test.csv) | test | 33.52 |
1303
+ | [finance_banking](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/finance_banking_test.csv) | test | 32.59 |
1304
+ | [official_document_management](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/official_document_management_test.csv) | test | 32.43 |
1305
+ | [fire_science](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/fire_science_test.csv) | test | 30.65 |
1306
+ | [junior_social_studies](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/junior_social_studies_test.csv) | test | 47.62 |
1307
+ | [accounting](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/accounting_test.csv) | test | 20.94 |
1308
+ | [engineering_math](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/engineering_math_test.csv) | test | 27.18 |
1309
+ | [education_(profession_level)](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/education_(profession_level)_test.csv) | test | 24.07 |
1310
+ | [chinese_language_and_literature](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/chinese_language_and_literature_test.csv) | test | 27.64 |
1311
+ | [management_accounting](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/management_accounting_test.csv) | test | 24.19 |
1312
+ | [culinary_skills](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/culinary_skills_test.csv) | test | 39.38 |
1313
+ | [administrative_law](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/administrative_law_test.csv) | test | 25.71 |
1314
+ | [insurance_studies](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/insurance_studies_test.csv) | test | 33.42 |
1315
+ | [real_estate](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/real_estate_test.csv) | test | 22.83 |
1316
+ | [computer_science](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/computer_science_test.csv) | test | 31.61 |
1317
+ | [taxation](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/taxation_test.csv) | test | 27.47 |
1318
+ | [trade](https://huggingface.co/datasets/ikala/tmmluplus/blob/main/data/trade_test.csv) | test | 20.32 |
1319
+
1320
+
1321
+ #### Summary
1322
+ 模型版號 `v2024.11.27`,無論是基礎模型([lianghsun/Llama-3.2-Taiwan-3B](https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B))還是指令微調模型([lianghsun/Llama-3.2-Taiwan-3B-Instruct](https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct)),均未接受過 tmmlu++ 資料集的訓練,以確保測試的公平性。經測試,目前該模型在 tmmlu++ 上表現普遍不佳,未達及格分數,可能需要加入專業領域的資料集來強化基礎模型能力。
1323
+
1324
+ </details>
1325
+
1326
+ ## Model Examination [optional]
1327
+
1328
+ <!-- Relevant interpretability work for the model goes here -->
1329
+
1330
+ [More Information Needed]
1331
+
1332
+ ## Environmental Impact
1333
+
1334
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
1335
+ - **Hardware Type:** 🚀
1336
+ - **Hours used:** ⏳⏳⌛
1337
+ - **Cloud Provider:** [鴻鵠國際股份有限公司](https://www.honghutech.com/)
1338
+ - **Compute Region:** 🇹🇼
1339
+ - **Carbon Emitted:** ♻️
1340
+
1341
+ ## Technical Specifications
1342
+
1343
+ ### Model Architecture and Objective
1344
+
1345
+ [More Information Needed]
1346
+
1347
+ ### Compute Infrastructure
1348
+
1349
+ [More Information Needed]
1350
+
1351
+ #### Hardware
1352
+
1353
+ - **CPU count:** 32
1354
+ - **Logical CPU count:** 64
1355
+ - **GPU count:** 4
1356
+ - **GPU type:** NVIDIA H100 NVL
1357
+
1358
+ #### Software
1359
+
1360
+ - **OS version:** Linux-5.15.0-124-generic-x86_64-with-glibc2.35
1361
+ - **Python version:** 3.12.7
1362
+
1363
+ ## Citation
1364
+
1365
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
1366
+ ```bibtex
1367
+ @misc{lianghsun2024llama32taiwan3binstruct,
1368
+ author = {Huang, Liang Hsun},
1369
+ title = {Llama-3.2-Taiwan-3B-Instruct},
1370
+ year = {2024},
1371
+ publisher = {Hugging Face},
1372
+ howpublished = {\url{https://huggingface.co/lianghsun/Llama-3.2-Taiwan-3B-Instruct}},
1373
+ note = {Accessed: 2024-11-25}
1374
+ }
1375
+ ```
1376
+
1377
+ ## Glossary [optional]
1378
+
1379
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
1380
+ N/A
1381
+
1382
+ ## More Information
1383
+
1384
+ ### Acknowledge
1385
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/618dc56cbc345ca7bf95f3cd/28u7rOLoeUgn67clYEKuZ.png)
1386
+ 在此致謝[鴻鵠國際股份有限公司](https://www.honghutech.com/)蔡長明先生無償地贊助算力,以及曾經幫忙過:廖振翔、chweng、Ben、kevin、Maxxchu、Lam 和陳林彥…等朋友們,才能讓這個模型得以訓練完成,提供算力者乃人生父母。
1387
+
1388
+ ### Usage
1389
+ 如果你基於此指令模型進行微調,希望能不吝嗇在 **模型卡片(model card)** 裡標註 **基礎模型** 為:
1390
+ ```yaml
1391
+ base_model: lianghsun/Llama-3.2-Taiwan-3B-Instruct
1392
+ ```
1393
+
1394
+ 標註和 ❤️ 是給予我們最大的鼓勵,謝謝。😀
1395
+
1396
+ ## Model Card Authors
1397
+
1398
+ [Huang Liang Hsun](https://www.linkedin.com/in/lianghsunhuang)
1399
+
1400
+ ## Model Card Contact
1401
+
1402
+ [Huang Liang Hsun](https://www.linkedin.com/in/lianghsunhuang)
1403
+
1404
+ ### Framework versions
1405
+
1406
+ - Transformers 4.45.2
1407
+ - Pytorch 2.4.1+cu121
1408
+ - Datasets 2.21.0
1409
+ - Tokenizers 0.20.0