NanQiangHF
/

llama3.1_8b_dpo_bwgenerator_test

Generated from Trainer

Model card Files Files and versions Community

llama3.1_8b_dpo_bwgenerator_test / README.md

NanQiangHF's picture

NanQiangHF/llama3.1_8b_dpo_bwgenerator_test

3a191ca verified about 2 months ago

|

4.39 kB

	---
	license: llama3.1
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	model-index:
	- name: llama3.1_8b_dpo_bwgenerator_test
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama3.1_8b_dpo_bwgenerator_test

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co./meta-llama/Meta-Llama-3.1-8B-Instruct) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0381
	- Rewards/chosen: -9.3770
	- Rewards/rejected: -40.9760
	- Rewards/accuracies: 0.9961
	- Rewards/margins: 31.5990
	- Logps/rejected: -519.9075
	- Logps/chosen: -178.3189
	- Logits/rejected: -1.4901
	- Logits/chosen: -1.9907

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.0864 \| 0.0719 \| 1000 \| 0.1031 \| -24.3451 \| -55.7071 \| 0.9919 \| 31.3620 \| -667.2187 \| -328.0001 \| -1.3920 \| -1.9101 \|
	\| 0.0721 \| 0.1438 \| 2000 \| 0.0666 \| -17.1956 \| -43.6489 \| 0.9932 \| 26.4533 \| -546.6367 \| -256.5046 \| -1.3146 \| -1.8819 \|
	\| 0.0513 \| 0.2157 \| 3000 \| 0.0586 \| -13.4148 \| -39.7394 \| 0.9932 \| 26.3247 \| -507.5419 \| -218.6962 \| -1.5754 \| -2.0549 \|
	\| 0.0391 \| 0.2876 \| 4000 \| 0.0518 \| -11.9859 \| -42.5627 \| 0.9942 \| 30.5768 \| -535.7746 \| -204.4073 \| -1.5376 \| -2.0293 \|
	\| 0.0431 \| 0.3595 \| 5000 \| 0.0584 \| -15.0281 \| -51.9022 \| 0.9945 \| 36.8741 \| -629.1698 \| -234.8300 \| -1.5020 \| -2.0037 \|
	\| 0.0386 \| 0.4313 \| 6000 \| 0.0399 \| -10.5384 \| -39.9545 \| 0.9961 \| 29.4161 \| -509.6927 \| -189.9328 \| -1.5356 \| -2.0315 \|
	\| 0.0417 \| 0.5032 \| 7000 \| 0.0452 \| -11.8813 \| -46.2602 \| 0.9955 \| 34.3789 \| -572.7493 \| -203.3616 \| -1.4399 \| -1.9551 \|
	\| 0.06 \| 0.5751 \| 8000 \| 0.0387 \| -9.4865 \| -39.5614 \| 0.9958 \| 30.0749 \| -505.7617 \| -179.4136 \| -1.5289 \| -2.0209 \|
	\| 0.0478 \| 0.6470 \| 9000 \| 0.0376 \| -9.9444 \| -40.6988 \| 0.9961 \| 30.7544 \| -517.1356 \| -183.9923 \| -1.5154 \| -2.0106 \|
	\| 0.022 \| 0.7189 \| 10000 \| 0.0399 \| -9.6813 \| -41.9896 \| 0.9961 \| 32.3084 \| -530.0439 \| -181.3615 \| -1.4896 \| -1.9912 \|
	\| 0.0254 \| 0.7908 \| 11000 \| 0.0378 \| -9.1448 \| -40.6698 \| 0.9961 \| 31.5250 \| -516.8457 \| -175.9964 \| -1.5031 \| -2.0023 \|
	\| 0.0357 \| 0.8627 \| 12000 \| 0.0387 \| -9.6321 \| -41.6962 \| 0.9961 \| 32.0641 \| -527.1096 \| -180.8692 \| -1.4851 \| -1.9878 \|
	\| 0.0626 \| 0.9346 \| 13000 \| 0.0381 \| -9.3770 \| -40.9760 \| 0.9961 \| 31.5990 \| -519.9075 \| -178.3189 \| -1.4901 \| -1.9907 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.44.0
	- Pytorch 2.3.0+cu121
	- Datasets 2.14.7
	- Tokenizers 0.19.1