Does the weights is fp16?
#6
by
lucasjin
- opened
Why so big
eh, since fp16 or bf16 are considered as sufficient training precision for LLMs. And thus, they are stored in fp16 precisions.
But most of the time, you'd rather quantize them to 4bit which 1/4 the size to make inferencing faster and use less ram