Quantize llama 3.2 1B into onnx INT8
How can i quantize llama 3.2 1B into onnx INT8 like this model? Can someone tell me how do it for onnx quantization?
Hi, @lakpriya .
@Xenova
already converted llama 3.2 1B into onnx INT8 like this model:
https://huggingface.co./onnx-community/Llama-3.2-1B-Instruct
In case you want to convert it yourself, you can follow these instructions: https://github.com/xenova/transformers.js?tab=readme-ov-file#convert-your-models-to-onnx
@Felladrin Thanks so much. I tried to use it with my app, which uses onnx-runtime, but it gives me a regex error from the tokenizers.js. but its working fine for llama 160m. that's why I thought to convert it myself. Do you know why I get that error from the tokenizers.js?
Unfortunately, I'm out of clues about this error.
And now that you asked, I noticed there is an open issue about the conversion of (another) Llama 3.2 1B conversion: https://github.com/xenova/transformers.js/issues/967 (So I'm not sure the conversion will work at this moment)
Thank you @Felladrin , I will check
@Xenova
checked it didn't work. I checked the createPattern
, and it's the same in both libs. let regex = pattern.Regex.replace(/\\([#&~])/g, '$1');
I tried replacing the existing
@xenova
/transformers
s tokernizers.js with @huggingface/transformers
s tockernizers.js, but I'm getting the same issue as before.
function createPattern(pattern, invert = true) {
if (pattern.Regex !== undefined) {
// In certain cases, the pattern may contain unnecessary escape sequences (e.g., \# or \& or \~).
// i.e., valid in Python (where the patterns are exported from) but invalid in JavaScript (where the patterns are parsed).
// This isn't an issue when creating the regex w/o the 'u' flag, but it is when the 'u' flag is used.
// For this reason, it is necessary to remove these backslashes before creating the regex.
// See https://stackoverflow.com/a/63007777/13989043 for more information
let regex = pattern.Regex.replace(/\\([#&~])/g, '$1'); // TODO: add more characters to this list if necessary
// We also handle special cases where the regex contains invalid (non-JS compatible) syntax.
for (const [key, value] of PROBLEMATIC_REGEX_MAP) {
regex = regex.replaceAll(key, value);
}
return new RegExp(regex, 'gu');
} else if (pattern.String !== undefined) {
const escaped = escapeRegExp(pattern.String);
// NOTE: if invert is true, we wrap the pattern in a group so that it is kept when performing .split()
return new RegExp(invert ? escaped : `(${escaped})`, 'gu');
} else {
console.warn('Unknown pattern type:', pattern)
return null;
}
}
@Xenova
I fixed the issue with tokenizers.js
, Now Llama 3.2 1b works fine on my iPhone. Can I create a PR for that?