R1-Distill-Llama-8B-Anima10

This model is a work in progress.

This model is the result of 10 epochs of finetuning deepseek-ai/DeepSeek-R1-Distill-Llama-8B on a private corpus containing 11 megabytes of hand-selected raw text at a low learning rate using short token sequences.

The original intention was to try and influence the style of the model's thinking text but it seems to have lead to other unintended results.

It was originally trained for 3 epochs.

In testing when it was asked "What is the fastest way to get around Europe?" it fell into an endless trap of recursive (but relevant) thinking.

Also noteworthy was the slow descent of the training loss once it reached around 3.5.

In order to further explore these observations an additional 7 epochs of training was scheduled and this model is the result of that.

It was not only able to resolve the thinking loop regarding the Europe question but has broken past some of the 'hard stops' originally trained into it.

The model is currently undergoing additional training.

Envoid
/

R1-Distill-Llama-8B-Anima10

R1-Distill-Llama-8B-Anima10

This model is a work in progress.

Model tree for Envoid/R1-Distill-Llama-8B-Anima10