Fine-tuned/hyperfitted with methodology from https://arxiv.org/abs/2412.04318

Updated 23.02.2025: same dataset, 512 token sequences with 64 tokens sliding window (loss still decreased). Significant hellaswag drop (~22%)

Safetensors

Model size

14.8B params

Tensor type

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for pk11/Qwen2.5-14B-Instruct-1M-HF-GK

Base model

Qwen/Qwen2.5-14B

Finetuned

Finetuned

(12)

this model

Quantizations