There seems to be an explanation to finetune the model in 4bit, would it be possible to provide more info on 4bit inference? Thanks!
· Sign up or log in to comment