Xuan Son NGUYEN's picture

Xuan Son NGUYEN

ngxson

·

https://blog.ngxson.com

AI & ML interests

Doing AI for fun, not for profit

Recent Activity

reacted to bartowski's post with 👀 about 6 hours ago

Looks like Q4_0_N_M file types are going away Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable) You can see the reference PR here: https://github.com/ggerganov/llama.cpp/pull/10446 So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms) As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those ! Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541 Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

reacted to bartowski's post with 👍 about 6 hours ago

Looks like Q4_0_N_M file types are going away Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable) You can see the reference PR here: https://github.com/ggerganov/llama.cpp/pull/10446 So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms) As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those ! Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541 Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

updated a dataset about 14 hours ago

ngxson/MiniThinky-dataset-v3

View all activity

Articles

Introducing GGUF-my-LoRA

Code a simple RAG from scratch

Introduction to ggml

Organizations

ngxson's activity

New activity in 5CD-AI/Viet-Doc-VQA-verIII about 14 hours ago

🚩 Report: Not working

#1 opened 1 day ago by

New activity in ngxson/MiniThinky-dataset 3 days ago

Librarian Bot: Add language metadata for dataset

#2 opened 4 days ago by

New activity in ngxson/MiniThinky-1B-Llama-3.2 3 days ago

Update README.md

#2 opened 3 days ago by

New activity in bartowski/QVQ-72B-Preview-GGUF 4 days ago

Add system message

#7 opened 4 days ago by

Ollama upload please.

#2 opened 17 days ago by

AlgorithmicKing

New activity in ngxson/MiniThinky-v2-1B-Llama-3.2 4 days ago

Upload folder using huggingface_hub

#1 opened 4 days ago by

New activity in ngxson/MiniThinky-1B-Llama-3.2 5 days ago

Upload folder using huggingface_hub

#1 opened 5 days ago by

New activity in ggml-org/gguf-my-repo 10 days ago

Update app.py

#144 opened 12 days ago by

New activity in ggml-org/gguf-my-repo about 1 month ago

Accessing own private repos

#141 opened about 1 month ago by

[Errno 2] No such file or directory: './llama.cpp/llama-quantize'

#140 opened about 1 month ago by

New activity in ggml-org/gguf-my-repo about 2 months ago

Error quantizing: b'/bin/sh: 1: ./llama.cpp/llama-quantize: not found\n'

#136 opened about 2 months ago by

Better isolation + various improvements

#133 opened 2 months ago by

New activity in ggml-org/gguf-my-repo 2 months ago

update readme for card generation

#128 opened 3 months ago by

Error converting to fp16: b'INFO:hf-to-gguf:Loading model: qwen2.5-3b

#135 opened 2 months ago by

Qwen2.5-3B: [Errno 2] No such file or directory: 'downloads/tmpg0g5sjvl'

#134 opened 2 months ago by

add docker compose for dev locally

#130 opened 2 months ago by

Add F16 and BF16 quantization

#129 opened 2 months ago by

Update app.py

#132 opened 2 months ago by

New activity in HuggingFaceTB/SmolLM2-1.7B-Instruct 2 months ago

Update README.md

#6 opened 2 months ago by

New activity in mlabonne/Llama-3.1-70B-Instruct-lorablated 2 months ago

LoRA-only GGUF

#4 opened 5 months ago by