I switch source to Q8_0 instead of f16/bf16. As much as Llama3 models are dense, there is still no notable difference between Q8_0 and fp16 (perplexity is equivalent, or even a little bit lower (-0.001 - -0.002ppl) compared to 16 bpw due to some favorable rounding I guess).

Downloads last month
25
GGUF
Model size
70.6B params
Architecture
llama

5-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.