Fine-tuned/hyperfitted with methodology from https://arxiv.org/abs/2412.04318
using OrthoGrad optimizer https://arxiv.org/abs/2501.04697
Updated 23.02.2025: same dataset, 512 token sequences with 64 tokens sliding window (loss still decreased). Significant hellaswag drop (~22%)
- Downloads last month
- 11
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.