Model Performance

by Sk1382 - opened Sep 24, 2024

Sep 24, 2024

Hello @dakshvar22 , I have converted this model into .GGUF using 5-bit quantization. Ran it on local using ollama services. Used it to train my flow with rasa train, inspected with rasa inspect. The model performance is not as good as expected, as it is working smooth with rasa/cmd_gen_codellama_13b_calm_demo. What should be done to the model to run good in local of 8GB GPU?

dakshvar22

Rasa org Sep 25, 2024

Hi @Sk2613, I haven't tried it myself but quantization post training can degrade the performance of the model.
Did you try this model without any quantization also on your flows?
Also, have you explored fine-tuning this model further on your assistant data using the fine-tuning recipe solution?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment