yentinglin's picture
Update README.md
0e15803 verified
metadata
license: apache-2.0
language:
  - zh
widget:
  - text: >-
      A chat between a curious user and an artificial intelligence assistant.
      The assistant gives helpful, detailed, and polite answers to the user's
      questions. USER: 你好,請問你可以幫我寫一封推薦信嗎? ASSISTANT:
library_name: transformers
pipeline_tag: text-generation
extra_gated_heading: Acknowledge license to accept the repository.
extra_gated_prompt: Please contact the author for access.
extra_gated_button_content: Acknowledge license 同意以上內容
extra_gated_fields:
  Name: text
  Mail: text
  Organization: text
  Country: text
  Any utilization of the Taiwan LLM repository mandates the explicit acknowledgment and attribution to the original author: checkbox
  使用Taiwan LLM必須明確地承認和歸功於優必達株式會社 Ubitus 以及原始作者: checkbox
Taiwan LLM Logo

🌟 Checkout Taiwan-LLM Demo Chat-UI 🌟

Model Card for Taiwan LLM 13B v2.0 chat

Taiwan LLM is an advanced language model tailored for Traditional Chinese, focusing on the linguistic and cultural contexts of Taiwan. Developed from a large base model, it's enriched with diverse Taiwanese textual sources and refined through Supervised Fine-Tuning. This model excels in language understanding and generation, aligning closely with Taiwan's cultural nuances. It demonstrates improved performance on various benchmarks like TC-Eval, showcasing its contextual comprehension and cultural relevance. For detailed insights into Taiwan LLM's development and features, refer to our technical report.

Model description

  • Model type: A 13B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
  • Language(s) (NLP): Primarily Traditional Chinese (zh-tw)
  • Finetuned from model: yentinglin/Taiwan-LLM-13B-v2.0-base

Model Sources

Performance

image/png

TMMLUS+ score: 24.76727075757576

Intended uses

Here's how you can run the model using the pipeline() function from 🤗 Transformers:

# pip install transformers>=4.34
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="yentinglin/Taiwan-LLM-13B-v2.0-chat", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co./docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "你是一個人工智慧助理",
    },
    {"role": "user", "content": "東北季風如何影響台灣氣候?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Training hyperparameters

image/png

image/png

image/png

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 5.0

Citation

If you find Taiwan LLM is useful in your work, please cite it with:

@misc{lin2023taiwan,
      title={Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model}, 
      author={Yen-Ting Lin and Yun-Nung Chen},
      year={2023},
      eprint={2311.17487},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

Taiwan LLM v2 is conducted in collaboration with Ubitus K.K.. Ubitus provides valuable compute resources for the project.

Open LLM Leaderboard

Task Version Metric Value Stderr
leaderboard:arc:challenge:25 0 acc 0.5529 ± 0.0145
acc_norm 0.5862 ± 0.0144
leaderboard:gsm8k:5 0 qem 0.3177 ± 0.0128
leaderboard:hellaswag:10 0 acc 0.6307 ± 0.0048
acc_norm 0.8327 ± 0.0037
leaderboard:mmlu:_average:5 acc 0.5483 ± 0.0356
leaderboard:mmlu:abstract_algebra:5 0 acc 0.3400 ± 0.0476
leaderboard:mmlu:anatomy:5 0 acc 0.5111 ± 0.0432
leaderboard:mmlu:astronomy:5 0 acc 0.5789 ± 0.0402
leaderboard:mmlu:business_ethics:5 0 acc 0.5100 ± 0.0502
leaderboard:mmlu:clinical_knowledge:5 0 acc 0.6000 ± 0.0302
leaderboard:mmlu:college_biology:5 0 acc 0.5764 ± 0.0413
leaderboard:mmlu:college_chemistry:5 0 acc 0.4100 ± 0.0494
leaderboard:mmlu:college_computer_science:5 0 acc 0.4500 ± 0.0500
leaderboard:mmlu:college_mathematics:5 0 acc 0.3800 ± 0.0488
leaderboard:mmlu:college_medicine:5 0 acc 0.5434 ± 0.0380
leaderboard:mmlu:college_physics:5 0 acc 0.2941 ± 0.0453
leaderboard:mmlu:computer_security:5 0 acc 0.7000 ± 0.0461
leaderboard:mmlu:conceptual_physics:5 0 acc 0.4468 ± 0.0325
leaderboard:mmlu:econometrics:5 0 acc 0.2719 ± 0.0419
leaderboard:mmlu:electrical_engineering:5 0 acc 0.4552 ± 0.0415
leaderboard:mmlu:elementary_mathematics:5 0 acc 0.3175 ± 0.0240
leaderboard:mmlu:formal_logic:5 0 acc 0.3413 ± 0.0424
leaderboard:mmlu:global_facts:5 0 acc 0.3700 ± 0.0485
leaderboard:mmlu:high_school_biology:5 0 acc 0.6323 ± 0.0274
leaderboard:mmlu:high_school_chemistry:5 0 acc 0.4581 ± 0.0351
leaderboard:mmlu:high_school_computer_science:5 0 acc 0.5400 ± 0.0501
leaderboard:mmlu:high_school_european_history:5 0 acc 0.6364 ± 0.0376
leaderboard:mmlu:high_school_geography:5 0 acc 0.6970 ± 0.0327
leaderboard:mmlu:high_school_government_and_politics:5 0 acc 0.7617 ± 0.0307
leaderboard:mmlu:high_school_macroeconomics:5 0 acc 0.4974 ± 0.0254
leaderboard:mmlu:high_school_mathematics:5 0 acc 0.3296 ± 0.0287
leaderboard:mmlu:high_school_microeconomics:5 0 acc 0.5336 ± 0.0324
leaderboard:mmlu:high_school_physics:5 0 acc 0.3709 ± 0.0394
leaderboard:mmlu:high_school_psychology:5 0 acc 0.7468 ± 0.0186
leaderboard:mmlu:high_school_statistics:5 0 acc 0.4074 ± 0.0335
leaderboard:mmlu:high_school_us_history:5 0 acc 0.7108 ± 0.0318
leaderboard:mmlu:high_school_world_history:5 0 acc 0.7046 ± 0.0297
leaderboard:mmlu:human_aging:5 0 acc 0.6323 ± 0.0324
leaderboard:mmlu:human_sexuality:5 0 acc 0.5878 ± 0.0432
leaderboard:mmlu:international_law:5 0 acc 0.6694 ± 0.0429
leaderboard:mmlu:jurisprudence:5 0 acc 0.7037 ± 0.0441
leaderboard:mmlu:logical_fallacies:5 0 acc 0.6564 ± 0.0373
leaderboard:mmlu:machine_learning:5 0 acc 0.3393 ± 0.0449
leaderboard:mmlu:management:5 0 acc 0.7087 ± 0.0450
leaderboard:mmlu:marketing:5 0 acc 0.8333 ± 0.0244
leaderboard:mmlu:medical_genetics:5 0 acc 0.5400 ± 0.0501
leaderboard:mmlu:miscellaneous:5 0 acc 0.7382 ± 0.0157
leaderboard:mmlu:moral_disputes:5 0 acc 0.6127 ± 0.0262
leaderboard:mmlu:moral_scenarios:5 0 acc 0.3788 ± 0.0162
leaderboard:mmlu:nutrition:5 0 acc 0.6046 ± 0.0280
leaderboard:mmlu:philosophy:5 0 acc 0.6270 ± 0.0275
leaderboard:mmlu:prehistory:5 0 acc 0.6204 ± 0.0270
leaderboard:mmlu:professional_accounting:5 0 acc 0.3582 ± 0.0286
leaderboard:mmlu:professional_law:5 0 acc 0.3931 ± 0.0125
leaderboard:mmlu:professional_medicine:5 0 acc 0.5184 ± 0.0304
leaderboard:mmlu:professional_psychology:5 0 acc 0.5556 ± 0.0201
leaderboard:mmlu:public_relations:5 0 acc 0.6818 ± 0.0446
leaderboard:mmlu:security_studies:5 0 acc 0.6122 ± 0.0312
leaderboard:mmlu:sociology:5 0 acc 0.7164 ± 0.0319
leaderboard:mmlu:us_foreign_policy:5 0 acc 0.8200 ± 0.0386
leaderboard:mmlu:virology:5 0 acc 0.4578 ± 0.0388
leaderboard:mmlu:world_religions:5 0 acc 0.7661 ± 0.0325
leaderboard:truthfulqa:mc:0 0 truthfulqa_mc1 0.2840 ± 0.0158
truthfulqa_mc2 0.4423 ± 0.0146
leaderboard:winogrande:5 0 acc 0.7593 ± 0.0120

TC-Eval

Task Version Metric Value Stderr
community:tc-eval-v2:drcd:0 0 pem 0.6848 ± 0.0079
pqem 0.6799 ± 0.0079
community:tc-eval-v2:penguin_table:0 0 acc 0.2361 ± 0.0355
community:tc-eval-v2:_average:5 acc 0.3508 ± 0.0318
community:tc-eval-v2:tmmluplus-accounting:5 0 acc 0.2565 ± 0.0317
community:tc-eval-v2:tmmluplus-administrative_law:5 0 acc 0.2833 ± 0.0220
community:tc-eval-v2:tmmluplus-advance_chemistry:5 0 acc 0.3333 ± 0.0427
community:tc-eval-v2:tmmluplus-agriculture:5 0 acc 0.1987 ± 0.0326
community:tc-eval-v2:tmmluplus-anti_money_laundering:5 0 acc 0.5597 ± 0.0430
community:tc-eval-v2:tmmluplus-auditing:5 0 acc 0.2836 ± 0.0192
community:tc-eval-v2:tmmluplus-basic_medical_science:5 0 acc 0.2841 ± 0.0146
community:tc-eval-v2:tmmluplus-business_management:5 0 acc 0.4245 ± 0.0421
community:tc-eval-v2:tmmluplus-chinese_language_and_literature:5 0 acc 0.2714 ± 0.0316
community:tc-eval-v2:tmmluplus-clinical_psychology:5 0 acc 0.3840 ± 0.0437
community:tc-eval-v2:tmmluplus-computer_science:5 0 acc 0.4195 ± 0.0375
community:tc-eval-v2:tmmluplus-culinary_skills:5 0 acc 0.4589 ± 0.0292
community:tc-eval-v2:tmmluplus-dentistry:5 0 acc 0.3885 ± 0.0244
community:tc-eval-v2:tmmluplus-economics:5 0 acc 0.3053 ± 0.0233
community:tc-eval-v2:tmmluplus-education:5 0 acc 0.4355 ± 0.0447
community:tc-eval-v2:tmmluplus-education_(profession_level):5 0 acc 0.2819 ± 0.0204
community:tc-eval-v2:tmmluplus-educational_psychology:5 0 acc 0.4489 ± 0.0376
community:tc-eval-v2:tmmluplus-engineering_math:5 0 acc 0.2718 ± 0.0441
community:tc-eval-v2:tmmluplus-finance_banking:5 0 acc 0.3037 ± 0.0397
community:tc-eval-v2:tmmluplus-financial_analysis:5 0 acc 0.2801 ± 0.0230
community:tc-eval-v2:tmmluplus-fire_science:5 0 acc 0.2500 ± 0.0390
community:tc-eval-v2:tmmluplus-general_principles_of_law:5 0 acc 0.3113 ± 0.0452
community:tc-eval-v2:tmmluplus-geography_of_taiwan:5 0 acc 0.4492 ± 0.0180
community:tc-eval-v2:tmmluplus-human_behavior:5 0 acc 0.3883 ± 0.0278
community:tc-eval-v2:tmmluplus-insurance_studies:5 0 acc 0.3487 ± 0.0173
community:tc-eval-v2:tmmluplus-introduction_to_law:5 0 acc 0.3165 ± 0.0303
community:tc-eval-v2:tmmluplus-jce_humanities:5 0 acc 0.3444 ± 0.0504
community:tc-eval-v2:tmmluplus-junior_chemistry:5 0 acc 0.3158 ± 0.0322
community:tc-eval-v2:tmmluplus-junior_chinese_exam:5 0 acc 0.4171 ± 0.0374
community:tc-eval-v2:tmmluplus-junior_math_exam:5 0 acc 0.2286 ± 0.0318
community:tc-eval-v2:tmmluplus-junior_science_exam:5 0 acc 0.3427 ± 0.0326
community:tc-eval-v2:tmmluplus-junior_social_studies:5 0 acc 0.4683 ± 0.0446
community:tc-eval-v2:tmmluplus-logic_reasoning:5 0 acc 0.2734 ± 0.0379
community:tc-eval-v2:tmmluplus-macroeconomics:5 0 acc 0.3187 ± 0.0230
community:tc-eval-v2:tmmluplus-management_accounting:5 0 acc 0.2977 ± 0.0313
community:tc-eval-v2:tmmluplus-marketing_management:5 0 acc 0.4624 ± 0.0520
community:tc-eval-v2:tmmluplus-mechanical:5 0 acc 0.4831 ± 0.0462
community:tc-eval-v2:tmmluplus-music:5 0 acc 0.3993 ± 0.0294
community:tc-eval-v2:tmmluplus-national_protection:5 0 acc 0.4929 ± 0.0345
community:tc-eval-v2:tmmluplus-nautical_science:5 0 acc 0.2777 ± 0.0191
community:tc-eval-v2:tmmluplus-occupational_therapy_for_psychological_disorders:5 0 acc 0.4438 ± 0.0213
community:tc-eval-v2:tmmluplus-official_document_management:5 0 acc 0.3559 ± 0.0322
community:tc-eval-v2:tmmluplus-optometry:5 0 acc 0.2804 ± 0.0148
community:tc-eval-v2:tmmluplus-organic_chemistry:5 0 acc 0.3486 ± 0.0459
community:tc-eval-v2:tmmluplus-pharmacology:5 0 acc 0.3397 ± 0.0197
community:tc-eval-v2:tmmluplus-pharmacy:5 0 acc 0.2174 ± 0.0209
community:tc-eval-v2:tmmluplus-physical_education:5 0 acc 0.3966 ± 0.0367
community:tc-eval-v2:tmmluplus-physics:5 0 acc 0.2371 ± 0.0434
community:tc-eval-v2:tmmluplus-politic_science:5 0 acc 0.3407 ± 0.0150
community:tc-eval-v2:tmmluplus-real_estate:5 0 acc 0.3804 ± 0.0509
community:tc-eval-v2:tmmluplus-secondary_physics:5 0 acc 0.3393 ± 0.0449
community:tc-eval-v2:tmmluplus-statistics_and_machine_learning:5 0 acc 0.3438 ± 0.0318
community:tc-eval-v2:tmmluplus-taiwanese_hokkien:5 0 acc 0.2636 ± 0.0389
community:tc-eval-v2:tmmluplus-taxation:5 0 acc 0.2507 ± 0.0224
community:tc-eval-v2:tmmluplus-technical:5 0 acc 0.4204 ± 0.0247
community:tc-eval-v2:tmmluplus-three_principles_of_people:5 0 acc 0.5396 ± 0.0424
community:tc-eval-v2:tmmluplus-trade:5 0 acc 0.2251 ± 0.0187
community:tc-eval-v2:tmmluplus-traditional_chinese_medicine_clinical_medicine:5 0 acc 0.3094 ± 0.0278
community:tc-eval-v2:tmmluplus-trust_practice:5 0 acc 0.3292 ± 0.0235
community:tc-eval-v2:tmmluplus-ttqav2:5 0 acc 0.6726 ± 0.0443
community:tc-eval-v2:tmmluplus-tve_chinese_language:5 0 acc 0.4161 ± 0.0225
community:tc-eval-v2:tmmluplus-tve_design:5 0 acc 0.4542 ± 0.0227
community:tc-eval-v2:tmmluplus-tve_mathematics:5 0 acc 0.2733 ± 0.0365
community:tc-eval-v2:tmmluplus-tve_natural_sciences:5 0 acc 0.3349 ± 0.0229
community:tc-eval-v2:tmmluplus-veterinary_pathology:5 0 acc 0.2544 ± 0.0259
community:tc-eval-v2:tmmluplus-veterinary_pharmacology:5 0 acc 0.3259 ± 0.0202