|
--- |
|
license: apache-2.0 |
|
tags: |
|
- trl |
|
- dpo |
|
- generated_from_trainer |
|
base_model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0 |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# ENERGY-DRINK-LOVE/eeve_dpo-v3 |
|
|
|
### Our Team |
|
* Jingyeom Kim |
|
* Youjin Chung |
|
|
|
## Model |
|
|
|
### Base Model |
|
* [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co./yanolja/EEVE-Korean-Instruct-10.8B-v1.0) |
|
|
|
### Hardware and Software |
|
* Hardware: A100 * 8 for training our model |
|
* Deepspeed library & Huggingface TRL Trainer |
|
|
|
### Dataset |
|
* DPO_dataset |
|
* ์์ฒด ์ ์ dpo dataset(AI-hub dataset ํ์ฉ) |
|
* OpenOrca DPO ๋ฑ ์์ด ๋ฐ์ดํฐ์
๋ฒ์ญ(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, ์์ฒด๋ชจ๋ธ ํ์ฉ) |
|
|
|
### Training Method |
|
* [DPO](https://arxiv.org/abs/2305.18290) |
|
|
|
## Benchmark |
|
|
|
**[Ko LM Eval Harness](https://github.com/Beomi/ko-lm-evaluation-harness)** |
|
| Task | 0-shot | 5-shot | |
|
| :--------------- | -----------: | -----------: | |
|
| kobest_boolq | 0.950142 | 0.944444 | |
|
| kobest_copa | 0.751 | 0.835 | |
|
| kobest_hellaswag | 0.474 | 0.508 | |
|
| kobest_sentineg | 0.811083 | 0.972292 | |
|
| **Average** | **0.74655625** | **0.81493399** | |
|
|
|
**[Ko-LLM-Leaderboard](https://www.aihub.or.kr/leaderboard/view.do?currMenu=500&topMenu=102)** |
|
* (240307๊ธฐ์ค 7๋ฑ) |
|
| Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 | |
|
| ------: | -----: | -----------: | ------: | ------------: | --------------: | |
|
| 57.97 | 57.51 | 67.01 | 56.3 | 54.86 | 54.19 | |
|
|
|
|
|
|
|
|