This is not an instruct fine tune, instead it's an attempt to de-contaminate the model, remove gptslop and refusals. I want model to feel like it was trained on human data, not synthetic one.

About 961 steps total, Yi-34B-200K llamafied DPO trained for 1 epoch on rawrr_v2 dataset via unsloth qlora at prompt length of 400 and max length of 700, lr 0.000045
Model initialized with max_positional_embeddings of 4096 to not OOM.
Training done on RTX 3090 Ti in about 14 hours.
Average mem usage was like 23.89 / 23.99 GiB, so very close to OOM at all times.
I trained it with XFCE on one 1080p monitor loaded up, on more fancy DM it would probably OOM with the same setup.
I am not sure what's the purpose of max_prompt_length being separate from max_length, so I may have used it wrong, I should read up on it.
Script I used to do this fine-tune is in the repo. I used chatml prompt format. Now I plan to fine-tune this on AEZAKMI v3 dataset soon.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.