zephyr-dpo-qlora-gpt4-5e-7-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/GPT4 dataset. It achieves the following results on the evaluation set:

Loss: 0.6603
Rewards/chosen: -0.3016
Rewards/rejected: -0.3998
Rewards/accuracies: 0.5992
Rewards/margins: 0.0982
Rewards/margins Max: 0.5219
Rewards/margins Min: -0.3348
Rewards/margins Std: 0.3823
Logps/rejected: -299.1642
Logps/chosen: -315.3784
Logits/rejected: -2.6357
Logits/chosen: -2.6728

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6815	0.28	100	0.6918	-0.0019	-0.0055	0.5516	0.0037	0.0181	-0.0087	0.0120	-259.7351	-285.4075	-2.8079	-2.8531
0.6235	0.56	200	0.6873	-0.0383	-0.0542	0.5873	0.0160	0.0859	-0.0499	0.0601	-264.6065	-289.0478	-2.7712	-2.8159
0.5521	0.85	300	0.6808	-0.1327	-0.1683	0.5952	0.0356	0.1823	-0.1064	0.1266	-276.0095	-298.4897	-2.7261	-2.7701
0.4853	1.13	400	0.6749	-0.2053	-0.2614	0.6032	0.0561	0.2952	-0.1704	0.2056	-285.3263	-305.7520	-2.6873	-2.7295
0.4561	1.41	500	0.6651	-0.1807	-0.2628	0.5913	0.0821	0.4091	-0.2388	0.2874	-285.4612	-303.2937	-2.6622	-2.7037
0.4337	1.69	600	0.6630	-0.2648	-0.3479	0.6111	0.0831	0.4556	-0.2917	0.3299	-293.9761	-311.7008	-2.6522	-2.6912
0.4052	1.97	700	0.6606	-0.2499	-0.3494	0.6151	0.0995	0.5023	-0.3041	0.3604	-294.1273	-310.2143	-2.6437	-2.6819
0.3797	2.25	800	0.6601	-0.2711	-0.3716	0.6151	0.1005	0.5194	-0.3194	0.3750	-296.3420	-312.3301	-2.6373	-2.6750
0.3692	2.54	900	0.6601	-0.2914	-0.3911	0.6032	0.0997	0.5207	-0.3303	0.3804	-298.2907	-314.3626	-2.6357	-2.6730
0.3953	2.82	1000	0.6607	-0.3036	-0.4008	0.6032	0.0972	0.5193	-0.3338	0.3807	-299.2639	-315.5808	-2.6356	-2.6727

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-dpo-qlora-gpt4-5e-7-epoch3

zephyr-dpo-qlora-gpt4-5e-7-epoch3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-dpo-qlora-gpt4-5e-7-epoch3

Evaluation results