ENERGY-DRINK-LOVE
/

eeve_dpo-v3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

eeve_dpo-v3 / README.md

jingyeom's picture

Update README.md

dac7866 verified 9 months ago

|

history blame contribute delete

1.72 kB

	---
	license: apache-2.0
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ENERGY-DRINK-LOVE/eeve_dpo-v3

	### Our Team
	* Jingyeom Kim
	* Youjin Chung

	## Model

	### Base Model
	* [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co./yanolja/EEVE-Korean-Instruct-10.8B-v1.0)

	### Hardware and Software
	* Hardware: A100 * 8 for training our model
	* Deepspeed library & Huggingface TRL Trainer

	### Dataset
	* DPO_dataset
	* 자체 제작 dpo dataset(AI-hub dataset 활용)
	* OpenOrca DPO 등 영어 데이터셋 번역(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, 자체모델 활용)

	### Training Method
	* [DPO](https://arxiv.org/abs/2305.18290)

	## Benchmark

	[Ko LM Eval Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
	\| Task \| 0-shot \| 5-shot \|
	\| :--------------- \| -----------: \| -----------: \|
	\| kobest_boolq \| 0.950142 \| 0.944444 \|
	\| kobest_copa \| 0.751 \| 0.835 \|
	\| kobest_hellaswag \| 0.474 \| 0.508 \|
	\| kobest_sentineg \| 0.811083 \| 0.972292 \|
	\| Average \| 0.74655625 \| 0.81493399 \|

	[Ko-LLM-Leaderboard](https://www.aihub.or.kr/leaderboard/view.do?currMenu=500&topMenu=102)
	* (240307기준 7등)
	\| Average \| Ko-ARC \| Ko-HellaSwag \| Ko-MMLU \| Ko-TruthfulQA \| Ko-CommonGen V2 \|
	\| ------: \| -----: \| -----------: \| ------: \| ------------: \| --------------: \|
	\| 57.97 \| 57.51 \| 67.01 \| 56.3 \| 54.86 \| 54.19 \|