|
--- |
|
license: apache-2.0 |
|
base_model: davidkim205/nox-solar-10.7b-v4 |
|
tags: |
|
- trl |
|
- dpo |
|
- generated_from_trainer |
|
model-index: |
|
- name: nhn_dpo_v3_nox-solar-10.7b-v4_DPO |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# nhn_dpo_v3_nox-solar-10.7b-v4_DPO |
|
|
|
|
|
### Our Team |
|
* Youjin Chung |
|
* Jingyeom Kim |
|
|
|
## Model |
|
|
|
### Base Model |
|
* [davidkim205/nox-solar-10.7b-v4](https://huggingface.co./davidkim205/nox-solar-10.7b-v4) |
|
|
|
### Hardware and Software |
|
* Hardware: A100 * 8 for training our model |
|
* Deepspeed library & Huggingface TRL Trainer |
|
|
|
### Dataset |
|
* DPO_dataset |
|
* μ체 μ μ dpo dataset(AI-hub dataset νμ©) |
|
* OpenOrca DPO λ± μμ΄ λ°μ΄ν°μ
λ²μ(ENERGY-DRINK-LOVE/translate_share_gpt_dedup_llama_SFT_1024, μ체λͺ¨λΈ νμ©) |
|
|
|
### Training Method |
|
* [DPO](https://arxiv.org/abs/2305.18290) |
|
|
|
## Benchmark |
|
|
|
**[Ko LM Eval Harness](https://github.com/Beomi/ko-lm-evaluation-harness)** |
|
### 0 shot (macro f1) |
|
|
|
| kobest_boolq | kobest_copa | kobest_hellaswag | kobest_sentineg | |
|
| ------: | -----: | -----------: | ------: | |
|
| 0.931613 | 0.740751 | 0.468602 | 0.488465 | |
|
|
|
|
|
|
|
|
|
|