ENERGY-DRINK-LOVE
/

nox_DPOv3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

nox_DPOv3 / README.md

jingyeom's picture

Update README.md

b83a769 verified 6 months ago

|

history blame contribute delete

No virus

1.25 kB

	---
	license: apache-2.0
	base_model: davidkim205/nox-solar-10.7b-v4
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: nhn_dpo_v3_nox-solar-10.7b-v4_DPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# nhn_dpo_v3_nox-solar-10.7b-v4_DPO


	### Our Team
	* Youjin Chung
	* Jingyeom Kim

	## Model

	### Base Model
	* [davidkim205/nox-solar-10.7b-v4](https://huggingface.co./davidkim205/nox-solar-10.7b-v4)

	### Hardware and Software
	* Hardware: A100 * 8 for training our model
	* Deepspeed library & Huggingface TRL Trainer

	### Dataset
	* DPO_dataset
	* 자체 제작 dpo dataset(AI-hub dataset 활용)
	* OpenOrca DPO 등 영어 데이터셋 번역(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, 자체모델 활용)

	### Training Method
	* [DPO](https://arxiv.org/abs/2305.18290)

	## Benchmark

	[Ko LM Eval Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
	### 0 shot (macro f1)

	\| kobest_boolq \| kobest_copa \| kobest_hellaswag \| kobest_sentineg \|
	\| ------: \| -----: \| -----------: \| ------: \|
	\| 0.931613 \| 0.740751 \| 0.468602 \| 0.488465 \|