I switch source to Q8_0 instead of f16/bf16. As much as Llama3 models are dense, there is still no notable difference between Q8_0 and fp16 (perplexity is equivalent, or even a little bit lower (-0.001 - -0.002ppl) compared to 16 bpw due to some favorable rounding I guess).

Downloads last month: 25

GGUF

Model size

70.6B params

Architecture

llama

5-bit

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.