zephyr-7b-dpop-ours-qlora-5e-7-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF dataset. It achieves the following results on the evaluation set:

Loss: 0.9525
Positive Losses: 2.5650
Dpo Losses: 0.6654
Rewards/chosen: 0.0615
Rewards/rejected: -0.0035
Rewards/accuracies: 0.6370
Rewards/margins: 0.0649
Rewards/margins Max: 0.3501
Rewards/margins Min: -0.1835
Rewards/margins Std: 0.1778
Logps/rejected: -258.9347
Logps/chosen: -278.4630
Logits/rejected: -2.6770
Logits/chosen: -2.7150

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Positive Losses	Dpo Losses	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6888	0.28	100	0.6930	0.0090	0.6919	0.0144	0.0119	0.5960	0.0025	0.0168	-0.0104	0.0091	-257.3992	-283.1658	-2.7677	-2.8066
0.6631	0.56	200	0.6984	0.1139	0.6859	0.0475	0.0323	0.5970	0.0152	0.0899	-0.0497	0.0463	-255.3604	-279.8582	-2.7518	-2.7902
0.6296	0.85	300	0.7188	0.3524	0.6802	0.0671	0.0392	0.5990	0.0279	0.1601	-0.0867	0.0826	-254.6683	-277.9036	-2.7287	-2.7668
0.6225	1.13	400	0.7561	0.7344	0.6753	0.0776	0.0381	0.6100	0.0395	0.2210	-0.1158	0.1128	-254.7784	-276.8472	-2.7128	-2.7504
0.5986	1.41	500	0.8408	1.5299	0.6717	0.0653	0.0164	0.6140	0.0488	0.2718	-0.1453	0.1394	-256.9439	-278.0837	-2.6920	-2.7297
0.6107	1.69	600	0.8630	1.7461	0.6689	0.0728	0.0171	0.6200	0.0557	0.3055	-0.1594	0.1554	-256.8792	-277.3334	-2.6848	-2.7225
0.5944	1.97	700	0.8998	2.0818	0.6674	0.0676	0.0079	0.625	0.0597	0.3249	-0.1697	0.1649	-257.8028	-277.8536	-2.6819	-2.7197
0.5619	2.25	800	0.9346	2.3977	0.6662	0.0630	0.0001	0.6300	0.0629	0.3402	-0.1784	0.1729	-258.5778	-278.3099	-2.6844	-2.7219
0.5725	2.54	900	0.9580	2.6145	0.6656	0.0590	-0.0056	0.6290	0.0646	0.3487	-0.1833	0.1774	-259.1476	-278.7048	-2.6818	-2.7195
0.5813	2.82	1000	0.9538	2.5739	0.6654	0.0612	-0.0038	0.6280	0.0651	0.3501	-0.1834	0.1778	-258.9730	-278.4868	-2.6794	-2.7173

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-7b-dpop-ours-qlora-5e-7-epoch3

zephyr-7b-dpop-ours-qlora-5e-7-epoch3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-7b-dpop-ours-qlora-5e-7-epoch3

Evaluation results