If we're gonna get fancy we should have Q8_0_L as well
#1
by
lemon07r
- opened
The new larger quant that bartowski is doing. Would be nice to have here as well.
Experimental, uses f16 for embed and output weights. Please provide any feedback of differences. Extremely high quality, generally unneeded but max available quant.
lemon07r
changed discussion status to
closed
lemon07r
changed discussion status to
open
Sometimes these additions help , sometimes they do not.
It is a per model / per quant issue too as well.
Did a lot of testing with these - different models and quants.
I do have a "Q8" project coming up ; it is just that this quant is a little more difficult to "raise up" ; with the exception of embed/output weights.
It's so interesting that the model significantly improves with the Command R template! I can also attest to that.
yes- it was a "lab" error ; funny how those work out sometimes.