Fine-tuned/hyperfitted with methodology from https://arxiv.org/abs/2412.04318

using OrthoGrad optimizer https://arxiv.org/abs/2501.04697

Updated 23.02.2025: same dataset, 512 token sequences with 64 tokens sliding window (loss still decreased). Significant hellaswag drop (~22%)

Downloads last month
11
Safetensors
Model size
14.8B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for pk11/Qwen2.5-14B-Instruct-1M-HF-GK

Base model

Qwen/Qwen2.5-14B
Finetuned
(12)
this model
Quantizations
1 model