Any chance/timeline for a q8 version?
#3
by
skyrien
- opened
This model is awesome, even the q5 version! Though with a RTX 4090, I can't quite run the fp16 version properly, and offloading any layers at all seems to break it entirely.
Any chance for a q8 version?
Be about 8 hours.
PsiPi
changed discussion status to
closed
This model is awesome, even the q5 version! Though with a RTX 4090, I can't quite run the fp16 version properly, and offloading any layers at all seems to break it entirely.
Any chance for a q8 version?
What sort of tokens/second do you get on the rtx 4090? how large are the images you're passing in?