dolly-v2-7b-dpo-full-1-epoch-hydrox-safe

This model is a fine-tuned version of databricks/dolly-v2-7b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0371
Rewards/chosen: 4.2799
Rewards/rejected: -3.8888
Rewards/accuracies: 0.9857
Rewards/margins: 8.1686
Logps/rejected: -598.4040
Logps/chosen: -377.1240
Logits/rejected: -1.2002
Logits/chosen: -1.5171

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.618	0.03	100	0.5642	0.6988	-0.1139	0.7424	0.8127	-560.6550	-412.9344	-1.1894	-1.4878
0.3539	0.07	200	0.3197	1.9159	-0.2730	0.8847	2.1889	-562.2463	-400.7641	-1.1625	-1.4800
0.2287	0.1	300	0.2128	2.8057	-0.5539	0.9200	3.3596	-565.0551	-391.8654	-1.1361	-1.4649
0.158	0.14	400	0.1673	3.4556	-1.0339	0.9327	4.4895	-569.8558	-385.3670	-1.1300	-1.4622
0.1599	0.17	500	0.1397	3.7485	-1.3338	0.9461	5.0823	-572.8546	-382.4376	-1.1275	-1.4607
0.1389	0.2	600	0.1273	3.9259	-1.5111	0.9529	5.4371	-574.6277	-380.6633	-1.1194	-1.4519
0.0778	0.24	700	0.1122	4.0699	-1.8498	0.9613	5.9197	-578.0140	-379.2233	-1.1302	-1.4542
0.0993	0.27	800	0.0975	4.2423	-1.9934	0.9663	6.2357	-579.4506	-377.5001	-1.1424	-1.4689
0.111	0.31	900	0.0907	4.3218	-2.2534	0.9697	6.5752	-582.0501	-376.7048	-1.1542	-1.4820
0.0893	0.34	1000	0.0882	4.3878	-2.2588	0.9663	6.6466	-582.1047	-376.0451	-1.1497	-1.4694
0.079	0.37	1100	0.0840	4.4706	-2.3132	0.9689	6.7838	-582.6481	-375.2164	-1.1532	-1.4807
0.0706	0.41	1200	0.0721	4.4319	-2.6505	0.9722	7.0824	-586.0217	-375.6038	-1.1667	-1.4885
0.0705	0.44	1300	0.0725	4.3743	-2.8717	0.9739	7.2460	-588.2330	-376.1799	-1.1817	-1.5001
0.0537	0.48	1400	0.0648	4.3847	-2.9676	0.9756	7.3523	-589.1927	-376.0760	-1.1789	-1.5019
0.0483	0.51	1500	0.0604	4.3761	-3.2295	0.9798	7.6056	-591.8114	-376.1613	-1.1923	-1.5114
0.0572	0.54	1600	0.0581	4.3258	-3.2641	0.9773	7.5899	-592.1575	-376.6645	-1.1855	-1.5042
0.066	0.58	1700	0.0539	4.3270	-3.3813	0.9815	7.7083	-593.3289	-376.6523	-1.1886	-1.5110
0.0561	0.61	1800	0.0501	4.3859	-3.3980	0.9798	7.7839	-593.4964	-376.0636	-1.1948	-1.5144
0.0538	0.65	1900	0.0504	4.4209	-3.4478	0.9815	7.8687	-593.9944	-375.7137	-1.2036	-1.5147
0.0493	0.68	2000	0.0472	4.3835	-3.5804	0.9832	7.9639	-595.3203	-376.0873	-1.1925	-1.5071
0.0374	0.71	2100	0.0449	4.2972	-3.7998	0.9840	8.0970	-597.5147	-376.9510	-1.2020	-1.5166
0.0475	0.75	2200	0.0442	4.3073	-3.6486	0.9840	7.9559	-596.0024	-376.8494	-1.1992	-1.5177
0.0407	0.78	2300	0.0408	4.3011	-3.7981	0.9882	8.0992	-597.4978	-376.9122	-1.2078	-1.5242
0.0386	0.82	2400	0.0397	4.3423	-3.7314	0.9882	8.0737	-596.8302	-376.4996	-1.2029	-1.5133
0.0504	0.85	2500	0.0390	4.3732	-3.7690	0.9857	8.1422	-597.2065	-376.1912	-1.2024	-1.5188
0.0402	0.88	2600	0.0377	4.3358	-3.8299	0.9865	8.1656	-597.8150	-376.5649	-1.1977	-1.5158
0.038	0.92	2700	0.0397	4.3284	-3.8383	0.9891	8.1667	-597.8990	-376.6386	-1.2033	-1.5139
0.0527	0.95	2800	0.0383	4.2985	-3.8490	0.9857	8.1475	-598.0059	-376.9374	-1.2037	-1.5196
0.0365	0.99	2900	0.0379	4.3086	-3.8349	0.9874	8.1435	-597.8653	-376.8369	-1.1997	-1.5156

Framework versions

Transformers 4.35.0
Pytorch 2.1.1+cu121
Datasets 2.14.6
Tokenizers 0.14.1

yihang7
/

dolly-v2-7b-dpo-full-1-epoch-hydrox-safe

dolly-v2-7b-dpo-full-1-epoch-hydrox-safe

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yihang7/dolly-v2-7b-dpo-full-1-epoch-hydrox-safe

Evaluation results