SGEcon's picture
Update README.md
acfc617 verified
|
raw
history blame
5.5 kB
metadata
library_name: transformers
license: cc-by-nc-4.0
datasets:
  - kyujinpy/KOR-OpenOrca-Platypus-v3
language:
  - ko
  - en
tags:
  - Economic
  - Finance

Model Details

Model Developers: Sogang University SGEconFinlab(<https://sc.sogang.ac.kr/aifinlab/)

Model Description

This model is a language model specialized in economics and finance. This was learned with various economic/finance-related data. The data sources are listed below, and we are not releasing the data that we trained on because it was used for research/policy purposes. If you wish to use the original data, please contact the original author directly for permission to use it.

Loading the Model

peft_model_id = "SGEcon/KoSOLAR-10.7B-v0.2_fin_v4"
config = PeftConfig.from_pretrained(peft_model_id)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map={"":0})
model = PeftModel.from_pretrained(model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model.eval()

Conducting Conversation

import re

def gen(x):
    inputs = tokenizer(f"### ์งˆ๋ฌธ: {x}\n\n### ๋‹ต๋ณ€:", return_tensors='pt', return_token_type_ids=False)

    # ๋ฐ์ดํ„ฐ๋ฅผ GPU๋กœ ์ด๋™(์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ)
    inputs = {k: v.to(device="cuda" if torch.cuda.is_available() else "cpu") for k, v in inputs.items()}

    gened = model.generate(
        **inputs,
        max_new_tokens=256,  # ์ƒˆ๋กœ ์ƒ์„ฑํ•  ํ† ํฐ์˜ ์ตœ๋Œ€ ๊ฐœ์ˆ˜
        early_stopping=True,
        num_return_sequences=1,  # ํ•˜๋‚˜์˜ ๋‹ต๋ณ€๋งŒ ์ƒ์„ฑ
        do_sample=True,  # ๋‹ค์–‘ํ•œ ๋‹ต๋ณ€ ์ƒ์„ฑ์„ ์œ„ํ•ด ์ƒ˜ํ”Œ๋ง ํ™œ์„ฑํ™”
        eos_token_id=tokenizer.eos_token_id,  # EOS ํ† ํฐ ID ์‚ฌ์šฉ
        temperature=0.9,  # ์ƒ์„ฑ ๋‹ค์–‘์„ฑ ์กฐ์ ˆ์„ ์œ„ํ•œ ์˜จ๋„ ์„ค์ •
        top_p=0.8,  # nucleus sampling์—์„œ ์‚ฌ์šฉํ•  p ๊ฐ’
        top_k=50  # top-k sampling์—์„œ ์‚ฌ์šฉํ•  k ๊ฐ’
    )

    # ์ƒ์„ฑ๋œ ์‹œํ€€์Šค๋ฅผ ๋””์ฝ”๋“œํ•˜์—ฌ ์ถœ๋ ฅ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜
    decoded = tokenizer.decode(gened[0], skip_special_tokens=True).strip()

    # "### ๋‹ต๋ณ€:" ๋ฌธ์ž์—ด ์ดํ›„์˜ ํ…์ŠคํŠธ๋งŒ ์ถ”์ถœ
    answer_start_idx = decoded.find("### ๋‹ต๋ณ€:") + len("### ๋‹ต๋ณ€:")
    complete_answer = decoded[answer_start_idx:].strip()

    # ์ฒซ ๋ฒˆ์งธ ๊ตฌ๋‘์ (. ? !)์„ ์ฐพ์•„์„œ ๊ทธ ๋ถ€๋ถ„๊นŒ์ง€๋งŒ ์ถ”์ถœ
    match = re.search(r"[\.\?\!][^\.\?\!]*$", complete_answer)
    if match:
        complete_answer = complete_answer[:match.end()].strip()

    return complete_answer

Training Details

First, we loaded the base model quantized to 4 bits. It can significantly reduce the amount of memory required to store the model's weights and intermediate computation results, which is beneficial for deploying models in environments with limited memory resources. It can also provide faster inference speeds. Then,

Training Data

  1. ํ•œ๊ตญ์€ํ–‰: ๊ฒฝ์ œ๊ธˆ์œต์šฉ์–ด 700์„ (https://www.bok.or.kr/portal/bbs/B0000249/view.do?nttId=235017&menuNo=200765)
  2. ๊ธˆ์œต๊ฐ๋…์›: ๊ธˆ์œต์†Œ๋น„์ž ์ •๋ณด ํฌํ„ธ ํŒŒ์ธ ๊ธˆ์œต์šฉ์–ด์‚ฌ์ „(https://fine.fss.or.kr/fine/fnctip/fncDicary/list.do?menuNo=900021)
  3. KDI ๊ฒฝ์ œ์ •๋ณด์„ผํ„ฐ: ์‹œ์‚ฌ ์šฉ์–ด์‚ฌ์ „(https://eiec.kdi.re.kr/material/wordDic.do)
  4. ํ•œ๊ตญ๊ฒฝ์ œ์‹ ๋ฌธ/ํ•œ๊ฒฝ๋‹ท์ปด: ํ•œ๊ฒฝ๊ฒฝ์ œ์šฉ์–ด์‚ฌ์ „(https://terms.naver.com/list.naver?cid=42107&categoryId=42107), ์˜ค๋Š˜์˜ TESAT(https://www.tesat.or.kr/bbs.frm.list/tesat_study?s_cateno=1), ์˜ค๋Š˜์˜ ์ฃผ๋‹ˆ์–ด TESAT(https://www.tesat.or.kr/bbs.frm.list/tesat_study?s_cateno=5), ์ƒ๊ธ€์ƒ๊ธ€ํ•œ๊ฒฝ(https://sgsg.hankyung.com/tesat/study)
  5. ์ค‘์†Œ๋ฒค์ฒ˜๊ธฐ์—…๋ถ€/๋Œ€ํ•œ๋ฏผ๊ตญ์ •๋ถ€: ์ค‘์†Œ๋ฒค์ฒ˜๊ธฐ์—…๋ถ€ ์ „๋ฌธ์šฉ์–ด(https://terms.naver.com/list.naver?cid=42103&categoryId=42103)
  6. ๊ณ ์„ฑ์‚ผ/๋ฒ•๋ฌธ์ถœํŒ์‚ฌ: ํšŒ๊ณ„ยท์„ธ๋ฌด ์šฉ์–ด์‚ฌ์ „(https://terms.naver.com/list.naver?cid=51737&categoryId=51737)
  7. ๋งจํ์˜ ๊ฒฝ์ œํ•™ 8ํŒ Word Index
  8. yanolja/KoSOLAR-10.7B-v0.2(<yanolja/KoSOLAR-10.7B-v0.2>)

Training Procedure

Training Hyperparameters

Hyperparameter SGEcon/KoSOLAR-10.7B-v0.2_fin_v4
Lora Method Lora
load in 4 bit True
learning rate 1e-5
lr scheduler linear
lora alpa 16
lora rank 16
lora dropout 0.05
optim paged_adamw_32bit
target_modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Results

[More Information Needed]

Summary

Citation [optional]