turboderp's picture
Update README.md
06cda4b

EXL2 quants of Mistral-7B-instruct

Converted from Mistral-7B-Instruct-v0.1. This is a straight conversion, but I have modified the config.json to make the default context size 7168 tokens, since in initial testing the model becomes unstable a while after that. It's possible that sliding window attention will allow the model to use its advertised 32k-token context, but this hasn't been tested yet.

2.50 bits per weight
2.70 bits per weight
3.00 bits per weight
3.50 bits per weight
4.00 bits per weight
4.65 bits per weight
5.00 bits per weight
6.00 bits per weight

measurement.json