DIBT Prompt collective SPIN
Collection
This collection contains resources related to the replication of SPIN with the dibt prompt collective dataset
•
8 items
•
Updated
•
7
A model matching the results of SPIN with very little data (30x less), carefully curated by the amazing Data Is Better Together community
This model is a fine-tuned version of argilla/zephyr-7b-spin-iter2-v0 on the argilla/10k_prompts_SPIN_iter3_zephyr_top and the argilla/10k_prompts_SPIN_iter2_zephyr_top dataset.
Check this repo for full reproducible code using the original SPIN implementation and distilabel.
If you want to contribute to high quality datasets like this, contribute to the DIBT prompt collective initiative.
Model | 1st Turn Score | 2nd Turn Score | Average Score | SPIN paper Score |
---|---|---|---|---|
zephyr-7b-sft-full | 6.6625 | 6.0250 | 6.34375 | 5.94 |
zephyr-7b-spin-iter0-v0 | 6.64375 | 6.1750 | 6.409375 | 6.46 |
zephyr-7b-spin-iter1-v0 | 6.90625 | 6.3000 | 6.603125 | 6.65 |
zephyr-7b-spin-iter2-v0 | 7.1375 | 6.3125 | 6.725000 | 6.78 |
zephyr-7b-spin-iter3-v0 | 7.09375 | 6.4500 | 6.771875 | - |
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real |
---|---|---|---|---|---|---|---|---|---|---|---|
0.2928 | 0.49 | 25 | 0.3951 | -2.6212 | -20.3268 | 0.9062 | 17.7056 | -700.5638 | -278.0876 | -2.8098 | -2.8090 |
0.1487 | 0.97 | 50 | 0.1319 | -2.9077 | -29.1459 | 0.9375 | 26.2382 | -702.3276 | -278.1449 | -2.8218 | -2.8066 |
0.006 | 1.46 | 75 | 0.1269 | -2.6037 | -29.1519 | 0.9583 | 26.5482 | -702.3289 | -278.0841 | -2.8175 | -2.8037 |
0.0086 | 1.94 | 100 | 0.1099 | -2.9181 | -29.6970 | 0.9271 | 26.7789 | -702.4378 | -278.1470 | -2.8177 | -2.8051 |
Base model
mistralai/Mistral-7B-v0.1