--- license: apache-2.0 tags: - trl - dpo - generated_from_trainer base_model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0 --- # ENERGY-DRINK-LOVE/eeve_dpo-v3 ### Our Team * Jingyeom Kim * Youjin Chung ## Model ### Base Model * [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co./yanolja/EEVE-Korean-Instruct-10.8B-v1.0) ### Hardware and Software * Hardware: A100 * 8 for training our model * Deepspeed library & Huggingface TRL Trainer ### Dataset * DPO_dataset * 자체 제작 dpo dataset(AI-hub dataset 활용) * OpenOrca DPO 등 영어 데이터셋 번역(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, 자체모델 활용) ### Training Method * [DPO](https://arxiv.org/abs/2305.18290) ## Benchmark **[Ko LM Eval Harness](https://github.com/Beomi/ko-lm-evaluation-harness)** | Task | 0-shot | 5-shot | | :--------------- | -----------: | -----------: | | kobest_boolq | 0.950142 | 0.944444 | | kobest_copa | 0.751 | 0.835 | | kobest_hellaswag | 0.474 | 0.508 | | kobest_sentineg | 0.811083 | 0.972292 | | **Average** | **0.74655625** | **0.81493399** | **[Ko-LLM-Leaderboard](https://www.aihub.or.kr/leaderboard/view.do?currMenu=500&topMenu=102)** * (240307기준 7등) | Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 | | ------: | -----: | -----------: | ------: | ------------: | --------------: | | 57.97 | 57.51 | 67.01 | 56.3 | 54.86 | 54.19 |