metadata
license: apache-2.0
base_model: davidkim205/nox-solar-10.7b-v4
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: nhn_dpo_v3_nox-solar-10.7b-v4_DPO
results: []
nhn_dpo_v3_nox-solar-10.7b-v4_DPO
Our Team
- Youjin Chung
- Jingyeom Kim
Model
Base Model
Hardware and Software
- Hardware: A100 * 8 for training our model
- Deepspeed library & Huggingface TRL Trainer
Dataset
- DPO_dataset
- ์์ฒด ์ ์ dpo dataset(AI-hub dataset ํ์ฉ)
- OpenOrca DPO ๋ฑ ์์ด ๋ฐ์ดํฐ์ ๋ฒ์ญ(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, ์์ฒด๋ชจ๋ธ ํ์ฉ)
Training Method
Benchmark
0 shot (macro f1)
kobest_boolq | kobest_copa | kobest_hellaswag | kobest_sentineg |
---|---|---|---|
0.931613 | 0.740751 | 0.468602 | 0.488465 |