Model save

c080a55 verified 6 months ago

4.81 kB

	---
	library_name: peft
	tags:
	- trl
	- dpo
	- alignment-handbook
	- generated_from_trainer
	base_model: NbAiLab/nb-gpt-j-6B-v2
	model-index:
	- name: aftonposten-6b-align-scan
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# aftonposten-6b-align-scan

	This model is a fine-tuned version of [NbAiLab/nb-gpt-j-6B-v2](https://huggingface.co./NbAiLab/nb-gpt-j-6B-v2) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6943
	- Rewards/chosen: -0.2880
	- Rewards/rejected: -0.4573
	- Rewards/accuracies: 0.5718
	- Rewards/margins: 0.1693
	- Logps/rejected: -38.0247
	- Logps/chosen: -34.3545
	- Logits/rejected: -2.1371
	- Logits/chosen: -2.1419

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Logits/chosen \| Logits/rejected \| Logps/chosen \| Logps/rejected \| Validation Loss \| Rewards/accuracies \| Rewards/chosen \| Rewards/margins \| Rewards/rejected \|
	\|:-------------:\|:-----:\|:----:\|:-------------:\|:---------------:\|:------------:\|:--------------:\|:---------------:\|:------------------:\|:--------------:\|:---------------:\|:----------------:\|
	\| 0.6464 \| 0.26 \| 100 \| -2.2340 \| -2.2291 \| -34.0405 \| -37.5500 \| 0.6903 \| 0.5685 \| -0.0054 \| 0.0246 \| -0.0300 \|
	\| 0.5931 \| 0.52 \| 200 \| -2.2316 \| -2.2267 \| -34.0730 \| -37.5769 \| 0.6980 \| 0.5158 \| -0.0346 \| 0.0196 \| -0.0543 \|
	\| 0.5301 \| 0.78 \| 300 \| -2.2292 \| -2.2243 \| -34.0962 \| -37.6000 \| 0.6973 \| 0.5390 \| -0.0555 \| 0.0195 \| -0.0750 \|
	\| 0.389 \| 1.04 \| 400 \| 0.6933 \| -0.1201 \| -0.1849 \| 0.5507 \| 0.0649 \| -37.7221 \| -34.1680 \| -2.1983 \| -2.2032 \|
	\| 0.322 \| 1.3 \| 500 \| 0.7055 \| -0.2815 \| -0.3556 \| 0.5515 \| 0.0741 \| -37.9118 \| -34.3473 \| -2.1969 \| -2.2017 \|
	\| 0.327 \| 1.56 \| 600 \| 0.6703 \| -0.1443 \| -0.2944 \| 0.5806 \| 0.1500 \| -37.8437 \| -34.1949 \| -2.1819 \| -2.1867 \|
	\| 0.3034 \| 1.82 \| 700 \| 0.6868 \| -0.1851 \| -0.3175 \| 0.5656 \| 0.1323 \| -37.8694 \| -34.2402 \| -2.1701 \| -2.1749 \|
	\| 0.1649 \| 2.08 \| 800 \| 0.6812 \| -0.2229 \| -0.3850 \| 0.5951 \| 0.1621 \| -37.9443 \| -34.2822 \| -2.1594 \| -2.1642 \|
	\| 0.1691 \| 2.34 \| 900 \| 0.6881 \| -0.2514 \| -0.4183 \| 0.5831 \| 0.1669 \| -37.9814 \| -34.3138 \| -2.1476 \| -2.1524 \|
	\| 0.1953 \| 2.6 \| 1000 \| 0.6957 \| -0.2986 \| -0.4680 \| 0.5918 \| 0.1694 \| -38.0366 \| -34.3663 \| -2.1400 \| -2.1447 \|
	\| 0.1463 \| 2.86 \| 1100 \| 0.7010 \| -0.3003 \| -0.4559 \| 0.5714 \| 0.1555 \| -38.0231 \| -34.3682 \| -2.1379 \| -2.1427 \|
	\| 0.1796 \| 3.12 \| 1200 \| 0.6908 \| -0.2876 \| -0.4581 \| 0.5748 \| 0.1705 \| -38.0257 \| -34.3541 \| -2.1376 \| -2.1423 \|
	\| 0.1264 \| 3.38 \| 1300 \| 0.6911 \| -0.2772 \| -0.4526 \| 0.5893 \| 0.1755 \| -38.0196 \| -34.3425 \| -2.1374 \| -2.1422 \|
	\| 0.1206 \| 3.64 \| 1400 \| 0.6924 \| -0.2868 \| -0.4582 \| 0.5918 \| 0.1714 \| -38.0257 \| -34.3532 \| -2.1371 \| -2.1419 \|
	\| 0.1645 \| 3.9 \| 1500 \| 0.6943 \| -0.2880 \| -0.4573 \| 0.5718 \| 0.1693 \| -38.0247 \| -34.3545 \| -2.1371 \| -2.1419 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.39.0.dev0
	- Pytorch 2.1.2+cu121
	- Datasets 2.14.6
	- Tokenizers 0.15.1