deepseek small ones
(tiny original)
https://huggingface.co./deepseek-ai/deepseek-vl2-tiny
(8bit)
https://huggingface.co./mlx-community/deepseek-vl2-tiny-8bit/tree/main
and
(small original)
https://huggingface.co./deepseek-ai/deepseek-vl2-small/tree/main
(8bit)
https://huggingface.co./mlx-community/deepseek-vl2-small-8bit/tree/main
it give also 4bit from mlx for both version
also v3
https://huggingface.co./mlx-community/DeepSeek-V3-4bit/tree/main
i think for normal users still 4bit v3 is to big ^^
I queued deepseek-ai/deepseek-vl2-tiny
and deepseek-ai/deepseek-vl2-small
. The others will not work as you can't GGUF quant an already quantized model.
You can check the progress on http://hf.tst.eu/status.html
@mradermacher I don't think it actually got submitted. It mentioned something about "no architectures entry" but when checking https://huggingface.co./deepseek-ai/deepseek-vl2-tiny/blob/main/config.json and https://huggingface.co./deepseek-ai/deepseek-vl2-small/blob/main/config.json the architecture is DeepseekV2ForCausalLM which should be supported by llama.cpp if I remember correctly. I assume this happens because it is a vision model and so not supported by llama.cpp despite then text part using DeepseekV2ForCausalLM .
nico1 ~# llmc add -2007 si https://huggingface.co./deepseek-ai/deepseek-vl2-tiny
submit tokens: ["-2007","static","imatrix","https://huggingface.co./deepseek-ai/deepseek-vl2-tiny"]
https://huggingface.co./deepseek-ai/deepseek-vl2-tiny
deepseek-ai/deepseek-vl2-tiny: no architectures entry ()
nico1 ~# llmc add -2007 si https://huggingface.co./deepseek-ai/deepseek-vl2-small
submit tokens: ["-2007","static","imatrix","https://huggingface.co./deepseek-ai/deepseek-vl2-small"]
https://huggingface.co./deepseek-ai/deepseek-vl2-small
deepseek-ai/deepseek-vl2-small: no architectures entry ()
why vision model like MLLM not supported by llama.cpp, is it still in deveolopent, it give such as much ^^
"no architectures entry'" literally means the architectures
key is missing form config.js. This can happen when huggingface fails to deliver the file (happens rfarely) or a network problem (happens very rarely). And llmc add should not run in the dsandbox (which has no network), or does it...
btw. the nice level should be exactly -2000 normally (== it is fine to experiment), otherwise it will get absolute priority over other user-requested models.
must have been a network/hf problem, because it seems to work now (the submission part)
Ok, not sure why the submission part worked (have to check that), but the json.config clearly has no architectures key:
model_architecture = hparams["architectures"][0]
~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'architectures'
It has a language_config or so key which has an architectures key (and contains something that look like a transformers config.json). In any case it's not supported by llama.cpp in this form, and would need some form of doctoring. Maybe it's as easy as replacing config.json by the contents of the language_config...
ValueError: Can not map tensor 'image_newline'
No, it's not so easy. At the very least, somehow the extra tensors would have to be ignored/removed.
seems that is working with MLX (never tryed)
https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md
like the hugingface creator
https://huggingface.co./mlx-community/deepseek-vl2-small-8bit/tree/main