quantized / README.md

dpo_0uxjem3c

97a7051 5 months ago

6.97 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- dpo
	- unsloth
	- generated_from_trainer
	base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
	model-index:
	- name: dpo
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# dpo

	This model is a fine-tuned version of [unsloth/llama-3-8b-Instruct-bnb-4bit](https://huggingface.co./unsloth/llama-3-8b-Instruct-bnb-4bit) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6257
	- Rewards/chosen: 0.8141
	- Rewards/rejected: 0.4945
	- Rewards/accuracies: 0.6431
	- Rewards/margins: 0.3196
	- Logps/rejected: -229.7856
	- Logps/chosen: -249.2073
	- Logits/rejected: -0.6789
	- Logits/chosen: -0.6135

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 0
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 750
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6904 \| 0.0372 \| 28 \| 0.6811 \| 0.2766 \| 0.2476 \| 0.5770 \| 0.0290 \| -232.2545 \| -254.5816 \| -0.5471 \| -0.5010 \|
	\| 0.6591 \| 0.0745 \| 56 \| 0.6623 \| 0.9939 \| 0.8694 \| 0.5927 \| 0.1245 \| -226.0365 \| -247.4085 \| -0.5351 \| -0.4798 \|
	\| 0.6297 \| 0.1117 \| 84 \| 0.6542 \| 1.1966 \| 0.9862 \| 0.6136 \| 0.2104 \| -224.8689 \| -245.3818 \| -0.4689 \| -0.4120 \|
	\| 0.5985 \| 0.1489 \| 112 \| 0.6540 \| 1.5211 \| 1.2525 \| 0.6087 \| 0.2687 \| -222.2059 \| -242.1367 \| -0.4989 \| -0.4262 \|
	\| 0.6603 \| 0.1862 \| 140 \| 0.6459 \| 0.7737 \| 0.5130 \| 0.6304 \| 0.2607 \| -229.6009 \| -249.6110 \| -0.5779 \| -0.5054 \|
	\| 0.619 \| 0.2234 \| 168 \| 0.6411 \| 0.9352 \| 0.6917 \| 0.6222 \| 0.2435 \| -227.8137 \| -247.9963 \| -0.5842 \| -0.5261 \|
	\| 0.6497 \| 0.2606 \| 196 \| 0.6427 \| 0.8696 \| 0.6404 \| 0.6282 \| 0.2292 \| -228.3268 \| -248.6518 \| -0.5798 \| -0.5255 \|
	\| 0.6014 \| 0.2979 \| 224 \| 0.6397 \| 0.8941 \| 0.6357 \| 0.6263 \| 0.2583 \| -228.3730 \| -248.4069 \| -0.6397 \| -0.5816 \|
	\| 0.594 \| 0.3351 \| 252 \| 0.6361 \| 0.7069 \| 0.4027 \| 0.6319 \| 0.3043 \| -230.7038 \| -250.2785 \| -0.6434 \| -0.5848 \|
	\| 0.5898 \| 0.3723 \| 280 \| 0.6356 \| 1.0373 \| 0.7462 \| 0.6278 \| 0.2911 \| -227.2686 \| -246.9745 \| -0.6340 \| -0.5714 \|
	\| 0.639 \| 0.4096 \| 308 \| 0.6342 \| 0.7199 \| 0.4321 \| 0.6342 \| 0.2878 \| -230.4095 \| -250.1490 \| -0.6956 \| -0.6293 \|
	\| 0.6289 \| 0.4468 \| 336 \| 0.6363 \| 0.4299 \| 0.1879 \| 0.6248 \| 0.2420 \| -232.8515 \| -253.0488 \| -0.6705 \| -0.6155 \|
	\| 0.6304 \| 0.4840 \| 364 \| 0.6321 \| 0.7719 \| 0.5053 \| 0.6435 \| 0.2667 \| -229.6779 \| -249.6284 \| -0.6279 \| -0.5652 \|
	\| 0.6126 \| 0.5213 \| 392 \| 0.6325 \| 0.5194 \| 0.2033 \| 0.6375 \| 0.3161 \| -232.6973 \| -252.1539 \| -0.6785 \| -0.6117 \|
	\| 0.5974 \| 0.5585 \| 420 \| 0.6254 \| 0.7418 \| 0.4269 \| 0.6428 \| 0.3149 \| -230.4618 \| -249.9303 \| -0.6823 \| -0.6170 \|
	\| 0.6185 \| 0.5957 \| 448 \| 0.6267 \| 0.9534 \| 0.6106 \| 0.6409 \| 0.3428 \| -228.6247 \| -247.8141 \| -0.6532 \| -0.5866 \|
	\| 0.604 \| 0.6330 \| 476 \| 0.6284 \| 0.8011 \| 0.4691 \| 0.6394 \| 0.3320 \| -230.0398 \| -249.3374 \| -0.6842 \| -0.6177 \|
	\| 0.6154 \| 0.6702 \| 504 \| 0.6269 \| 0.8353 \| 0.5307 \| 0.6431 \| 0.3046 \| -229.4234 \| -248.9947 \| -0.6705 \| -0.6051 \|
	\| 0.5936 \| 0.7074 \| 532 \| 0.6277 \| 0.7287 \| 0.4206 \| 0.6469 \| 0.3082 \| -230.5248 \| -250.0604 \| -0.6887 \| -0.6226 \|
	\| 0.6291 \| 0.7447 \| 560 \| 0.6260 \| 0.8539 \| 0.5327 \| 0.6439 \| 0.3211 \| -229.4030 \| -248.8091 \| -0.6758 \| -0.6096 \|
	\| 0.6169 \| 0.7819 \| 588 \| 0.6255 \| 0.8797 \| 0.5669 \| 0.6461 \| 0.3127 \| -229.0613 \| -248.5513 \| -0.6690 \| -0.6041 \|
	\| 0.5934 \| 0.8191 \| 616 \| 0.6256 \| 0.8582 \| 0.5399 \| 0.6461 \| 0.3183 \| -229.3312 \| -248.7658 \| -0.6753 \| -0.6095 \|
	\| 0.6004 \| 0.8564 \| 644 \| 0.6257 \| 0.8263 \| 0.5074 \| 0.6450 \| 0.3189 \| -229.6564 \| -249.0845 \| -0.6761 \| -0.6110 \|
	\| 0.6282 \| 0.8936 \| 672 \| 0.6256 \| 0.8133 \| 0.4949 \| 0.6442 \| 0.3184 \| -229.7819 \| -249.2152 \| -0.6748 \| -0.6101 \|
	\| 0.5572 \| 0.9309 \| 700 \| 0.6258 \| 0.8122 \| 0.4938 \| 0.6442 \| 0.3184 \| -229.7925 \| -249.2255 \| -0.6781 \| -0.6129 \|
	\| 0.595 \| 0.9681 \| 728 \| 0.6256 \| 0.8140 \| 0.4943 \| 0.6428 \| 0.3197 \| -229.7873 \| -249.2078 \| -0.6788 \| -0.6134 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.2
	- Tokenizers 0.19.1