During inference, should I set the torch_dtype to bf16 (like during finetuning) or to fp16 (which is found in config.json)?
torch_dtype
config.json
Lianmin Zheng from FastChat says "fp16 is okay".
· Sign up or log in to comment