Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
0.0
TFLOPS
38
22
23
Xuan Son NGUYEN
ngxson
Follow
Gargaz's profile picture
thomwolf's profile picture
aswinsson's profile picture
74 followers
·
29 following
https://blog.ngxson.com
ngxson
ngxson
ngxson
ngxson.hf.co
AI & ML interests
Doing AI for fun, not for profit
Recent Activity
reacted
to
bartowski
's
post
with 👀
about 6 hours ago
Looks like Q4_0_N_M file types are going away Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable) You can see the reference PR here: https://github.com/ggerganov/llama.cpp/pull/10446 So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms) As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those ! Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541 Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
reacted
to
bartowski
's
post
with 👍
about 6 hours ago
Looks like Q4_0_N_M file types are going away Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable) You can see the reference PR here: https://github.com/ggerganov/llama.cpp/pull/10446 So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms) As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those ! Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541 Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
updated
a dataset
about 14 hours ago
ngxson/MiniThinky-dataset-v3
View all activity
Articles
Introducing GGUF-my-LoRA
Nov 1, 2024
•
13
Code a simple RAG from scratch
Oct 29, 2024
•
16
Introduction to ggml
Aug 13, 2024
•
125
Organizations
ngxson
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
5CD-AI/Viet-Doc-VQA-verIII
about 14 hours ago
🚩 Report: Not working
3
#1 opened 1 day ago by
khang119966
New activity in
ngxson/MiniThinky-dataset
3 days ago
Librarian Bot: Add language metadata for dataset
#2 opened 4 days ago by
librarian-bot
New activity in
ngxson/MiniThinky-1B-Llama-3.2
3 days ago
Update README.md
1
#2 opened 3 days ago by
Xenova
New activity in
bartowski/QVQ-72B-Preview-GGUF
4 days ago
Add system message
1
#7 opened 4 days ago by
ngxson
Ollama upload please.
15
#2 opened 17 days ago by
AlgorithmicKing
New activity in
ngxson/MiniThinky-v2-1B-Llama-3.2
4 days ago
Upload folder using huggingface_hub
1
#1 opened 4 days ago by
Xenova
New activity in
ngxson/MiniThinky-1B-Llama-3.2
5 days ago
Upload folder using huggingface_hub
#1 opened 5 days ago by
Xenova
New activity in
ggml-org/gguf-my-repo
10 days ago
Update app.py
1
#144 opened 12 days ago by
gghfez
New activity in
ggml-org/gguf-my-repo
about 1 month ago
Accessing own private repos
2
#141 opened about 1 month ago by
themex1380
[Errno 2] No such file or directory: './llama.cpp/llama-quantize'
11
#140 opened about 1 month ago by
AlirezaF138
New activity in
ggml-org/gguf-my-repo
about 2 months ago
Error quantizing: b'/bin/sh: 1: ./llama.cpp/llama-quantize: not found\n'
6
#136 opened about 2 months ago by
win10
Better isolation + various improvements
3
#133 opened 2 months ago by
ngxson
New activity in
ggml-org/gguf-my-repo
2 months ago
update readme for card generation
4
#128 opened 3 months ago by
ariG23498
Error converting to fp16: b'INFO:hf-to-gguf:Loading model: qwen2.5-3b
1
#135 opened 2 months ago by
nanowell
Qwen2.5-3B: [Errno 2] No such file or directory: 'downloads/tmpg0g5sjvl'
1
#134 opened 2 months ago by
nanowell
add docker compose for dev locally
1
#130 opened 2 months ago by
ngxson
Add F16 and BF16 quantization
1
#129 opened 2 months ago by
andito
Update app.py
2
#132 opened 2 months ago by
velyan
New activity in
HuggingFaceTB/SmolLM2-1.7B-Instruct
2 months ago
Update README.md
1
#6 opened 2 months ago by
rasmus1610
New activity in
mlabonne/Llama-3.1-70B-Instruct-lorablated
2 months ago
LoRA-only GGUF
4
#4 opened 5 months ago by
ngxson
Load more