Model Performance

#1
by Sk2613 - opened

Hello @dakshvar22 , I have converted this model into .GGUF using 5-bit quantization. Ran it on local using ollama services. Used it to train my flow with rasa train, inspected with rasa inspect. The model performance is not as good as expected, as it is working smooth with rasa/cmd_gen_codellama_13b_calm_demo. What should be done to the model to run good in local of 8GB GPU?

Hi @Sk2613 , I haven't tried it myself but quantization post training can degrade the performance of the model.
Did you try this model without any quantization also on your flows?
Also, have you explored fine-tuning this model further on your assistant data using the fine-tuning recipe solution?

Sign up or log in to comment