shenzhi-wang
/

Llama3-8B-Chinese-Chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

shenzhi-wang commited on Apr 21

Commit

9276f51

•

1 Parent(s): e8ab262

Update README.md

Files changed (1) hide show

README.md +40 -1

README.md CHANGED Viewed

@@ -8,13 +8,15 @@ license_link: LICENSE
 This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO [1] based on the [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
 [1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
 Dataset: [DPO-En-Zh-20k](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k) (commit id: e8c5070d6564025fcf206f38d796ae264e028004).
-Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/tree/main) (commit id: ba559a659a82d1e2ffb8ea7939fe4d6b4b37fd92).
 Training details:
@@ -29,6 +31,43 @@ Training details:
 - optimizer: paged_adamw_32bit
 # 2. Examples

 This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO [1] based on the [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
+**Compared to the original [Meta-Llama-3-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), our Llama3-8B-Chinese-Chat model significantly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses. Additionally, compared to the original model, our model greatly reduces the number of emojis in the answers, making the responses more formal.**
 [1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
 Dataset: [DPO-En-Zh-20k](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k) (commit id: e8c5070d6564025fcf206f38d796ae264e028004).
+Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/tree/main) (commit id: 836ca0558698206bbf4e3b92533ad9f67c9f9864).
 Training details:
 - optimizer: paged_adamw_32bit
+Reproduce:
+```bash
+git clone https://github.com/hiyouga/LLaMA-Factory.git
+deepspeed --num_gpus 8 src/train_bash.py \
+    --deepspeed ${Your_Deepspeed_Config_Path} \
+    --stage orpo \
+    --do_train \
+    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
+    --dataset dpo_mix_en,dpo_mix_zh \
+    --template llama3 \
+    --finetuning_type full \
+    --output_dir ${Your_Output_Path} \
+    --per_device_train_batch_size 2 \
+    --per_device_eval_batch_size 2 \
+    --gradient_accumulation_steps 4 \
+    --lr_scheduler_type cosine \
+    --log_level info \
+    --logging_steps 5 \
+    --save_strategy epoch \
+    --save_total_limit 3 \
+    --save_steps 100 \
+    --learning_rate 5e-6 \
+    --num_train_epochs 3.0 \
+    --plot_loss \
+    --do_eval false \
+    --max_steps -1 \
+    --bf16 true \
+    --seed 42 \
+    --warmup_ratio 0.1 \
+    --cutoff_len 8192 \
+    --flash_attn true \
+    --orpo_beta 0.05 \
+    --optim paged_adamw_32bit
+```
 # 2. Examples