So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp

It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important

However what the quantization then does with that information is where I was wrong.

I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW

Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations

The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group

But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix

Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up

5 replies

liked 2 models 5 months ago

SandLogicTechnologies/Meta-Llama-3-8B-Instruct-GGUF

Text Generation • Updated Sep 10, 2024 • 9 • 2

SandLogicTechnologies/Phi-3.1-mini-4k-instruct-GGUF

Text Generation • Updated Aug 12, 2024 • 2 • 2

reacted to as-cle-bert's post with 🚀 6 months ago

Post

5061

Hi HF Community!🤗

In the past days, OpenAI announced their search engine, SearchGPT: today, I'm glad to introduce you SearchPhi, an AI-powered and open-source web search tool that aims to reproduce similar features to SearchGPT, built upon microsoft/Phi-3-mini-4k-instruct, llama.cpp🦙 and Streamlit.
Although not as capable as SearchGPT, SearchPhi v0.0-beta.0 is a first step toward a fully functional and multimodal search engine :)
If you want to know more, head over to the GitHub repository (https://github.com/AstraBert/SearchPhi) and, to test it out, use this HF space: as-cle-bert/SearchPhi
Have fun!🐱

liked a model 6 months ago

meta-llama/Llama-3.1-405B

Text Generation • Updated Sep 25, 2024 • 6.96k • 908

New activity in QuantFactory/Meta-Llama-3-8B-Instruct-GGUF 6 months ago

how to integrate in rasa chat bot

#18 opened 8 months ago by

alimography