mxz
/

llama3-8b-dpo

@@ -6,8 +6,6 @@
 # dataset Intruction
 ---
 **datasets:** \
-- mxz/alpaca_en_zh_ruozhiba_gpt4data \
-- PKU-Alignment/PKU-SafeRLHF \
 - mxz/CValues_DPO \
 **language:** \
 - zh \
@@ -17,7 +15,7 @@
 **pipeline_tag:** \
 - text-generation \
 **tags:** \
-- PPO \
 - fintune \
 - alignment \
 - LoRA \
@@ -42,7 +40,7 @@ Result:
 | ------------------- | ----- | ------ | ------ |
 | Llama-3-8B          | 55.5  | 47.0   | 48.0   |
 | Llama-3-8B-Instruct | 60.1  | 49.7   | 49.3   |
-| Llama-3-8B-ppo      | 62.2  | 49.9   | 49.4   |
 - Llama-3-8B evaluation result from [ymcui/Chinese-LLaMA-Alpaca-3](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3)

 # dataset Intruction
 ---
 **datasets:** \
 - mxz/CValues_DPO \
 **language:** \
 - zh \
 **pipeline_tag:** \
 - text-generation \
 **tags:** \
+- DPO \
 - fintune \
 - alignment \
 - LoRA \
 | ------------------- | ----- | ------ | ------ |
 | Llama-3-8B          | 55.5  | 47.0   | 48.0   |
 | Llama-3-8B-Instruct | 60.1  | 49.7   | 49.3   |
+| Llama-3-8B-dpo      | 62.2  | 49.9   | 49.4   |
 - Llama-3-8B evaluation result from [ymcui/Chinese-LLaMA-Alpaca-3](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3)