zephyr-7b-spin-iter3-v0

A model matching the results of SPIN with very little data (30x less), carefully curated by the amazing Data Is Better Together community

Check this repo for full reproducible code using the original SPIN implementation and distilabel.

If you want to contribute to high quality datasets like this, contribute to the DIBT prompt collective initiative.

MT-Bench results

Model	1st Turn Score	2nd Turn Score	Average Score	SPIN paper Score
zephyr-7b-sft-full	6.6625	6.0250	6.34375	5.94
zephyr-7b-spin-iter0-v0	6.64375	6.1750	6.409375	6.46
zephyr-7b-spin-iter1-v0	6.90625	6.3000	6.603125	6.65
zephyr-7b-spin-iter2-v0	7.1375	6.3125	6.725000	6.78
zephyr-7b-spin-iter3-v0	7.09375	6.4500	6.771875	-

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.2928	0.49	25	0.3951	-2.6212	-20.3268	0.9062	17.7056	-700.5638	-278.0876	-2.8098	-2.8090
0.1487	0.97	50	0.1319	-2.9077	-29.1459	0.9375	26.2382	-702.3276	-278.1449	-2.8218	-2.8066
0.006	1.46	75	0.1269	-2.6037	-29.1519	0.9583	26.5482	-702.3289	-278.0841	-2.8175	-2.8037
0.0086	1.94	100	0.1099	-2.9181	-29.6970	0.9271	26.7789	-702.4378	-278.1470	-2.8177	-2.8051