Quantize llama 3.2 1B into onnx INT8

#1
by lakpriya - opened

How can i quantize llama 3.2 1B into onnx INT8 like this model? Can someone tell me how do it for onnx quantization?

lakpriya changed discussion title from Quantize llama 3.1 1B into onnx INT8? to Quantize llama 3.2 1B into onnx INT8?
lakpriya changed discussion title from Quantize llama 3.2 1B into onnx INT8? to Quantize llama 3.2 1B into onnx INT8

Hi, @lakpriya .

@Xenova already converted llama 3.2 1B into onnx INT8 like this model:
https://huggingface.co./onnx-community/Llama-3.2-1B-Instruct

In case you want to convert it yourself, you can follow these instructions: https://github.com/xenova/transformers.js?tab=readme-ov-file#convert-your-models-to-onnx

@Felladrin Thanks so much. I tried to use it with my app, which uses onnx-runtime, but it gives me a regex error from the tokenizers.js. but its working fine for llama 160m. that's why I thought to convert it myself. Do you know why I get that error from the tokenizers.js?

Screenshot 2024-10-10 at 01.50.52.png

Unfortunately, I'm out of clues about this error.
And now that you asked, I noticed there is an open issue about the conversion of (another) Llama 3.2 1B conversion: https://github.com/xenova/transformers.js/issues/967 (So I'm not sure the conversion will work at this moment)

Thank you @Felladrin , I will check

@lakpriya can you try with Transformers.js v3? We have made some modifications to be able to consume python regular expressions

npm i @huggingface/transformers

@Xenova checked it didn't work. I checked the createPattern, and it's the same in both libs. let regex = pattern.Regex.replace(/\\([#&~])/g, '$1');

I tried replacing the existing @xenova /transformerss tokernizers.js with @huggingface/transformerss tockernizers.js, but I'm getting the same issue as before.

function createPattern(pattern, invert = true) {

    if (pattern.Regex !== undefined) {
        // In certain cases, the pattern may contain unnecessary escape sequences (e.g., \# or \& or \~).
        // i.e., valid in Python (where the patterns are exported from) but invalid in JavaScript (where the patterns are parsed).
        // This isn't an issue when creating the regex w/o the 'u' flag, but it is when the 'u' flag is used.
        // For this reason, it is necessary to remove these backslashes before creating the regex.
        // See https://stackoverflow.com/a/63007777/13989043 for more information
        let regex = pattern.Regex.replace(/\\([#&~])/g, '$1'); // TODO: add more characters to this list if necessary

        // We also handle special cases where the regex contains invalid (non-JS compatible) syntax.
        for (const [key, value] of PROBLEMATIC_REGEX_MAP) {
            regex = regex.replaceAll(key, value);
        }

        return new RegExp(regex, 'gu');

    } else if (pattern.String !== undefined) {
        const escaped = escapeRegExp(pattern.String);
        // NOTE: if invert is true, we wrap the pattern in a group so that it is kept when performing .split()
        return new RegExp(invert ? escaped : `(${escaped})`, 'gu');

    } else {
        console.warn('Unknown pattern type:', pattern)
        return null;
    }
}

@Xenova I fixed the issue with tokenizers.js , Now Llama 3.2 1b works fine on my iPhone. Can I create a PR for that?

@lakpriya Yes please! :)

Felladrin changed discussion status to closed

Sign up or log in to comment