zephyr-7b-dpop-ours-qlora-5e-6-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF dataset. It achieves the following results on the evaluation set:

Loss: 5.2043
Positive Losses: 49.6926
Dpo Losses: 0.6309
Rewards/chosen: -0.4548
Rewards/rejected: -0.7461
Rewards/accuracies: 0.6706
Rewards/margins: 0.2913
Rewards/margins Max: 1.0735
Rewards/margins Min: -0.5345
Rewards/margins Std: 0.7255
Logps/rejected: -333.7965
Logps/chosen: -330.7057
Logits/rejected: -2.5363
Logits/chosen: -2.5869

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Positive Losses	Dpo Losses	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6275	0.28	100	0.8540	1.6941	0.6742	0.0863	0.0420	0.6032	0.0443	0.2147	-0.1031	0.1420	-254.9810	-276.5898	-2.7114	-2.7527
0.599	0.56	200	1.9207	12.5808	0.6560	-0.0584	-0.1681	0.6389	0.1097	0.4903	-0.2555	0.3316	-275.9966	-291.0660	-2.7386	-2.7842
0.4901	0.85	300	2.8067	22.2141	0.6507	-0.1851	-0.3037	0.6389	0.1186	0.4724	-0.2575	0.3257	-289.5482	-303.7292	-2.7330	-2.7855
0.4414	1.13	400	2.6622	20.9278	0.6386	-0.1386	-0.3238	0.6746	0.1852	0.6971	-0.3749	0.4833	-291.5616	-299.0799	-2.6703	-2.7191
0.4651	1.41	500	2.6646	20.6090	0.6384	-0.1329	-0.3285	0.6627	0.1956	0.7628	-0.3883	0.5195	-292.0331	-298.5117	-2.6714	-2.7217
0.5269	1.69	600	5.0162	46.1312	0.6337	-0.4167	-0.6475	0.6627	0.2307	0.8626	-0.4616	0.5963	-323.9284	-326.8941	-2.6026	-2.6532
0.3513	1.97	700	4.8954	45.5933	0.6399	-0.4107	-0.6603	0.6627	0.2496	0.9744	-0.5254	0.6826	-325.2173	-326.2958	-2.5808	-2.6317
0.2795	2.25	800	4.7693	43.9090	0.6266	-0.3919	-0.6839	0.6825	0.2920	1.0657	-0.5266	0.7166	-327.5706	-324.4103	-2.5545	-2.6047
0.3544	2.54	900	5.3640	51.3363	0.6314	-0.4735	-0.7650	0.6706	0.2915	1.0782	-0.5345	0.7289	-335.6813	-332.5704	-2.5359	-2.5863
0.545	2.82	1000	5.2224	49.9806	0.6312	-0.4578	-0.7482	0.6627	0.2904	1.0718	-0.5332	0.7245	-333.9995	-330.9984	-2.5367	-2.5873

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-7b-dpop-ours-qlora-5e-6-epoch3

zephyr-7b-dpop-ours-qlora-5e-6-epoch3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-7b-dpop-ours-qlora-5e-6-epoch3

Evaluation results