Edit model card

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co./docs/hub/model-cards#model-card-metadata)

This model is compatible with tensor parallelism. The RHT runs per-GPU instead of across GPUs. q, k, v, up, and gate are split along the output channel, and o and down are split along the input channel. This model has slightly worse quality than the non "TP8" model.

Downloads last month: 4

Safetensors

Model size

54.5B params

Tensor type

BF16

F32

FP16

I16

Inference API

Unable to determine this model's library. Check the docs .

Collection including relaxml/Llama-3.1-405B-Instruct-QTIP-2Bit-TP8

QTIP Quantized Models

Collection

See https://github.com/Cornell-RelaxML/qtip • 27 items • Updated 17 days ago • 5