shenzhi-wang
commited on
Commit
•
9276f51
1
Parent(s):
e8ab262
Update README.md
Browse files
README.md
CHANGED
@@ -8,13 +8,15 @@ license_link: LICENSE
|
|
8 |
|
9 |
This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO [1] based on the [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
10 |
|
|
|
|
|
11 |
[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
|
12 |
|
13 |
|
14 |
Dataset: [DPO-En-Zh-20k](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k) (commit id: e8c5070d6564025fcf206f38d796ae264e028004).
|
15 |
|
16 |
|
17 |
-
Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/tree/main) (commit id:
|
18 |
|
19 |
|
20 |
Training details:
|
@@ -29,6 +31,43 @@ Training details:
|
|
29 |
- optimizer: paged_adamw_32bit
|
30 |
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
# 2. Examples
|
34 |
|
|
|
8 |
|
9 |
This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO [1] based on the [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
10 |
|
11 |
+
**Compared to the original [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), our Llama3-8B-Chinese-Chat model significantly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses. Additionally, compared to the original model, our model greatly reduces the number of emojis in the answers, making the responses more formal.**
|
12 |
+
|
13 |
[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
|
14 |
|
15 |
|
16 |
Dataset: [DPO-En-Zh-20k](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k) (commit id: e8c5070d6564025fcf206f38d796ae264e028004).
|
17 |
|
18 |
|
19 |
+
Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/tree/main) (commit id: 836ca0558698206bbf4e3b92533ad9f67c9f9864).
|
20 |
|
21 |
|
22 |
Training details:
|
|
|
31 |
- optimizer: paged_adamw_32bit
|
32 |
|
33 |
|
34 |
+
Reproduce:
|
35 |
+
|
36 |
+
```bash
|
37 |
+
git clone https://github.com/hiyouga/LLaMA-Factory.git
|
38 |
+
|
39 |
+
deepspeed --num_gpus 8 src/train_bash.py \
|
40 |
+
--deepspeed ${Your_Deepspeed_Config_Path} \
|
41 |
+
--stage orpo \
|
42 |
+
--do_train \
|
43 |
+
--model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
|
44 |
+
--dataset dpo_mix_en,dpo_mix_zh \
|
45 |
+
--template llama3 \
|
46 |
+
--finetuning_type full \
|
47 |
+
--output_dir ${Your_Output_Path} \
|
48 |
+
--per_device_train_batch_size 2 \
|
49 |
+
--per_device_eval_batch_size 2 \
|
50 |
+
--gradient_accumulation_steps 4 \
|
51 |
+
--lr_scheduler_type cosine \
|
52 |
+
--log_level info \
|
53 |
+
--logging_steps 5 \
|
54 |
+
--save_strategy epoch \
|
55 |
+
--save_total_limit 3 \
|
56 |
+
--save_steps 100 \
|
57 |
+
--learning_rate 5e-6 \
|
58 |
+
--num_train_epochs 3.0 \
|
59 |
+
--plot_loss \
|
60 |
+
--do_eval false \
|
61 |
+
--max_steps -1 \
|
62 |
+
--bf16 true \
|
63 |
+
--seed 42 \
|
64 |
+
--warmup_ratio 0.1 \
|
65 |
+
--cutoff_len 8192 \
|
66 |
+
--flash_attn true \
|
67 |
+
--orpo_beta 0.05 \
|
68 |
+
--optim paged_adamw_32bit
|
69 |
+
```
|
70 |
+
|
71 |
|
72 |
# 2. Examples
|
73 |
|