If we're gonna get fancy we should have Q8_0_L as well

#1
by lemon07r - opened

The new larger quant that bartowski is doing. Would be nice to have here as well.

Experimental, uses f16 for embed and output weights. Please provide any feedback of differences. Extremely high quality, generally unneeded but max available quant.

lemon07r changed discussion status to closed
lemon07r changed discussion status to open

Sometimes these additions help , sometimes they do not.
It is a per model / per quant issue too as well.
Did a lot of testing with these - different models and quants.
I do have a "Q8" project coming up ; it is just that this quant is a little more difficult to "raise up" ; with the exception of embed/output weights.

It's so interesting that the model significantly improves with the Command R template! I can also attest to that.

Owner

yes- it was a "lab" error ; funny how those work out sometimes.

Sign up or log in to comment