Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
0.0
TFLOPS
38
22
23
Xuan Son NGUYEN
ngxson
Follow
Aurelien-Morgan's profile picture
Akash20000's profile picture
Mi6paulino's profile picture
74 followers
Β·
29 following
https://blog.ngxson.com
ngxson
ngxson
ngxson
ngxson.hf.co
AI & ML interests
Doing AI for fun, not for profit
Recent Activity
reacted
to
bartowski
's
post
with π
about 6 hours ago
Looks like Q4_0_N_M file types are going away Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable) You can see the reference PR here: https://github.com/ggerganov/llama.cpp/pull/10446 So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms) As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those ! Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541 Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
reacted
to
bartowski
's
post
with π
about 6 hours ago
Looks like Q4_0_N_M file types are going away Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable) You can see the reference PR here: https://github.com/ggerganov/llama.cpp/pull/10446 So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms) As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those ! Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541 Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
updated
a dataset
about 13 hours ago
ngxson/MiniThinky-dataset-v3
View all activity
Articles
Introducing GGUF-my-LoRA
Nov 1, 2024
β’
13
Code a simple RAG from scratch
Oct 29, 2024
β’
16
Introduction to ggml
Aug 13, 2024
β’
125
Organizations
ngxson
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a dataset
6 days ago
itecgo/Topical-Chat-chatml
Viewer
β’
Updated
Dec 25, 2023
β’
8.63k
β’
30
β’
1
liked
a dataset
7 days ago
vblagoje/cc_news
Viewer
β’
Updated
Jan 4, 2024
β’
708k
β’
1.47k
β’
54
liked
a model
24 days ago
Datou1111/shou_xin
Text-to-Image
β’
Updated
Dec 9, 2024
β’
46.4k
β’
839
liked
a Space
24 days ago
Running
on
CPU Upgrade
1.31k
π’
Anychat
liked
a model
2 months ago
bartowski/Qwen2.5-Coder-14B-GGUF
Text Generation
β’
Updated
Nov 11, 2024
β’
517
β’
2
liked
a Space
2 months ago
Running
1.26k
π’
Qwen2.5 Coder Artifacts
liked
a dataset
2 months ago
qq8933/OpenLongCoT-Pretrain
Viewer
β’
Updated
Oct 28, 2024
β’
103k
β’
61
β’
87
liked
a model
2 months ago
OuteAI/OuteTTS-0.1-350M-GGUF
Text-to-Speech
β’
Updated
Nov 27, 2024
β’
201
β’
34
liked
a Space
2 months ago
Running
on
CPU Upgrade
23
π¦
GGUF My Lora
Convert your PEFT LoRA into GGUF
liked
a model
2 months ago
HuggingFaceTB/SmolLM2-1.7B-Instruct
Text Generation
β’
Updated
4 days ago
β’
76.6k
β’
β’
465
liked
a Space
3 months ago
Runtime error
449
π§ͺ
FLUX LoRa Lab
liked
2 datasets
3 months ago
reach-vb/gguf-stats
Viewer
β’
Updated
Dec 2, 2024
β’
60.5k
β’
23
β’
16
huggingface/documentation-images
Viewer
β’
Updated
1 day ago
β’
50
β’
2.04M
β’
46
liked
2 models
3 months ago
rain1011/pyramid-flow-sd3
Text-to-Video
β’
Updated
Oct 30, 2024
β’
799
bartowski/Humanish-LLama3-8B-Instruct-GGUF
Text Generation
β’
Updated
Oct 7, 2024
β’
354
β’
2
liked
a model
4 months ago
multimodalart/vintage-ads-flux
Text-to-Image
β’
Updated
Aug 26, 2024
β’
5.64k
β’
β’
79
liked
a model
5 months ago
OuteAI/Lite-Mistral-150M-v2-Instruct-GGUF
Updated
Aug 24, 2024
β’
248
β’
13
liked
a model
6 months ago
reach-vb/Meta-Llama-3.1-8B-Instruct-Q6_K-GGUF
Text Generation
β’
Updated
Jul 30, 2024
β’
111
β’
8
liked
a dataset
7 months ago
louisbrulenaudet/legalkit
Viewer
β’
Updated
Jun 26, 2024
β’
53k
β’
147
β’
29
liked
a model
7 months ago
failspy/Llama-3-8B-Instruct-MopeyMule
Text Generation
β’
Updated
May 30, 2024
β’
303
β’
75
Load more