Download bug fixed?
@mradermacher
i tried reuploading the model again and made sure all files are correct this time , let me know if the upload bug was avoided this time.
Thanks fot your hard work.
download successful :)
Fixed the config file issue with quantization, should be fine now
You guessed correctly that I wanted to quantize this (statically only, 'cause it says it is "only" experimental, and I have my plate full), alas:
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
Maybe I am doing it wrong - the previous deepseek model was converted by my team colleague
Yea , its experimental because i dont really know if the finetuning process worked correctly and didn't mess up the thinking traces.
Anyhow Try downloading the new config file , for some reason OG deepseek model has default quantization in the config file as fp8.
I removed it so anyone can use bitsandbytes.
Let me know if i can help.
This is not related to any issue in the config file, it's a tensor that llama.cpp does not understand. I'll ask the person who originally converted it for me to see if there is a trick.
We decided to wait with quants till the improved deepseek support has been merged into llama.cpp.
I’m so impressed that you managed to finetune DeepSeek R1. I tried and failed booth using 6x H200 and 6x MI300X using axolotl. What hardware and framework did you use to finetune it? I’m looking forward trying out your model and see if it beets the original.
As discussed in https://huggingface.co./mradermacher/BabyHercules-4x150M-GGUF/discussions/3#67bb32a36f77b7facdeca41c the issue likely was due to mradermacher not converting the model from FP8 to BF16 before trying to run convert_hf_to_gguf.py
. fp8_cast_bf16.py
will probably remove the tensor that is causing the conversing issues while putting all the experts currently stored as FP8 into the required BF16 format. The BF16 conversion is scheduled to be completed in around 4 hours after which we can again try to convert the model into a GGUF.
Doing static quants is likely worth it but for imatrix quants we better wait until https://github.com/ggml-org/llama.cpp/pull/11446 is merged as once it is we will have to requantize all DeepSeek V2/V3/R1 based models and would otherwise spend days of compute on something we have to throw away again relatively soon.
@nicoboss i used a 8xH100 cluster and i tried to mimic unsloth framework, since it only works on a single gpu.
Its highly experimental and i really doubt its better than that R1 zero , i tried to fix the the issue with it getting stuck in infinite loops of thinking and saying "but wait , maybe the user wants" .
@Daemontatox How comes https://huggingface.co./Daemontatox/Z1-Zero/blob/main/model-00001-of-000163.safetensors and https://huggingface.co./deepseek-ai/DeepSeek-R1/blob/main/model-00001-of-000163.safetensors have the exact same SHA256 hashes? Even if you look at your commit messages, they indicate that the model is a direct clone of https://huggingface.co./deepseek-ai/DeepSeek-R1 made using https://huggingface.co./spaces/huggingface-projects/repo_duplicator. Are you sure you have not accidentally published the wrong model?
All the SHA256 hashes of this model seem to be identical to DeepSeek-R1. Because of this quantizing this model seams absolutely pointless as it would result in identical quants than the ones, we created for DeepSeek-R1.
root@AI:/apool/BF16# CUDA_VISIBLE_DEVICES=1 venv/bin/python fp8_cast_bf16.py --input-fp8-hf-path /HDD/Z1-Zero --output-bf16-hf-path /HDD/Z1-Zero-BF16
12%|_________ | 19/163 [28:48<3:48:26, 95.19s/it]
I let BF16 conversion run until 12% before aborting it.
@nicoboss that might be a possibility, i will double check when my cluster frees up.
@nicoboss apologies for the mistake 😅 , currently uploading correct files.
Done
@nicoboss apologies for the mistake 😅 , currently uploading correct files.
Great! Awesome you still had the real files locally. I'm excited to quant and try them once uploaded.
i used a 8xH100 cluster and i tried to mimic unsloth framework, since it only works on a single gpu.
So you finetuned it with only 640 GB of GPU memory - that is extremely impressive. I would highly appreciate if could share your training code with the AI community. I and many others are desperately trying to finetune DeepSeek R1 since its release without any success so far. Providing the ability to finetune DeepSeek R1 using such limited resources would probably be a much greater contribution than any finetune you might publish leading to a massive amount of recognition. The best currently possible using LLaMA-Factory requires maybe around 1500 GB of GPU memory which is above what can be rented on RunPod or similar services unless you go AMD which is currently incompatible with DeepSeek R1.
Done
Awesome let's check it out.
I am still working on it and optimizing it , i will share the full repo once its done and i am satisfied with it .
Keep in mind its still and experimental attempt, so the results might not be as expected.
Thanks for the hard work.
The model is currently downloading. Once downloaded I will convert it from FP8 to BF16 and then BF16 to GGUF which we can then quantize. Due to the massive size of this model this will take a while but no worries I will keep you updated and inform you should any issue occur.
I am still working on it and optimizing it , i will share the full repo once its done and i am satisfied with it .
This is amazing to hear! This would be highly appreciated. I'm really looking forward to it. Please make sure to notify me when you publish the full repo so I don't mess it.
Keep in mind its still and experimental attempt, so the results might not be as expected.
No worries that is totally expected. I would in face be really surprised if the first R1 finetune is a success. A bad model might be almost as valuable as a good model as they help you to improve. We quant models no matter their quality so lesions can be learned.
Thanks for the hard work.
No problem. Thanks you so much for publishing the first DeepSeek R1 finetune not made by a massive cooperation.
@Daemontatox Static quants for this model are now avilable under https://huggingface.co./mradermacher/Zireal-0-GGUF or http://hf.tst.eu/model#Zireal-0-GGUF.
I also spent quite some time testing the model and it performed quite well. Here an example:
Command:
./llama-cli -m /mradermacher/root/Zireal-0.IQ4_XS.gguf -p "What is the meaning of life?" -c 4000
Build:
build: 4831 (5e43f104) with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu (with CUDA)
Configurations:
sampler seed: 3882492999
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4000
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4000, n_batch = 2048, n_predict = -1, n_keep = 1
Performance:
llama_perf_sampler_print: sampling time = 124.31 ms / 1013 runs ( 0.12 ms per token, 8148.79 tokens per second)
llama_perf_context_print: load time = 200857.09 ms
llama_perf_context_print: prompt eval time = 7839.14 ms / 10 tokens ( 783.91 ms per token, 1.28 tokens per second)
llama_perf_context_print: eval time = 11092436.04 ms / 1002 runs (11070.30 ms per token, 0.09 tokens per second)
llama_perf_context_print: total time = 12257652.28 ms / 1012 tokens
=> Around 3.5 hours to answer a single question.
Prompt:
What is the meaning of life?
Response:<think>
The question "What is the meaning of life?" is one of the most profound and enduring questions in philosophy, theology, and human existence. Different fields and perspectives offer various interpretations:
Philosophical Perspectives
Existentialism: Philosophers like Jean-Paul Sartre and Albert Camus argue that life has no inherent meaning, and that individuals must create their own purpose through actions and choices. According to Sartre, "existence precedes essence," which means that individuals first exist and then define their essence through their choices and actions.
Absurdism: Closely related to existentialism, absurdism, particularly as articulated by Albert Camus, posits that humans seek inherent value in a world that is inherently devoid of it, which leads to a conflict called "the absurd." Camus suggests that individuals should embrace the absurd and find a way to live passionately despite it.
Utilitarianism: Philosophers like John Stuart Mill suggest that the meaning of life might be found in maximizing happiness and well-being for the greatest number of people.
Hedonism: The hedonistic school of thought argues that the pursuit of pleasure and the avoidance of pain are the primary purposes of human life.
Aristotelian Eudaimonia: Aristotle introduced the concept of "eudaimonia," often translated as "flourishing" or "well-being," which he considered the highest human good. He believed that a meaningful life is one where one fulfills one’s potential and lives virtuously in accordance with reason.
Religious Perspectives
Christianity: For many Christians, the meaning of life is to love God, follow His commandments, and prepare for an eternal life with Him. The life of Jesus Christ serves as a model for how to live a meaningful life.
Islam: In Islam, the purpose of life is to worship and serve Allah, follow the teachings of the Quran and Prophet Muhammad, and strive for a place in paradise.
Buddhism: The goal in Buddhism is to achieve enlightenment (Nirvana) by following the Noble Eightfold Path and freeing oneself from the cycle of birth, death, and rebirth (samsara).
Hinduism: Similar to Buddhism in some respects, Hinduism emphasizes the pursuit of "dharma" (righteous living), "artha" (prosperity), "kama" (pleasure), and ultimately "moksha" (liberation from the cycle of rebirth).
Judaism: The meaning of life in Judaism is to serve God, live according to His laws as described in the Torah, and make the world a better place (Tikkun Olam).
Scientific Perspectives
- Biological: From a purely biological standpoint, the "meaning" of life could be seen as the drive for survival and reproduction, ensuring the continuation of one's genes.
Personal and Subjective Perspectives
Many believe that the meaning of life is a deeply personal question, and the "meaning" can vary from one person to another. Common personal meanings might include:
- Personal fulfillment: Finding what makes you happy and fulfilled, such as a career, a family, or a hobby.
- Relationships: Building deep and meaningful connections with family, friends, and loved ones.
- Contributions to society: Making a positive impact on the world through charity, innovation, or social change.
Conclusion
The question "What is the meaning of life?" is multifaceted and can be interpreted in countless ways. The "correct" answer often depends on one's personal beliefs, values, cultural background, and individual experiences. Many people spend their lives exploring and defining what makes their life meaningful for themselves.</think>
\boxed{The question "What is the meaning of life?" is a deeply philosophical question that varies widely based on individual beliefs and perspectives. Philosophically, some argue that life has no inherent meaning and that individuals must create their own purpose (existentialism, absurdism), while others suggest that the meaning of life is to maximize happiness (utilitarianism) or pursue pleasure (hedonism). Aristotle believed that a meaningful life is one where one fulfills one’s potential and lives virtuously (eudaimonia).
Religious perspectives also provide answers based on their doctrines. For instance, in Christianity, the meaning of life is often seen as loving and serving God and following the teachings of Jesus Christ. Islam emphasizes worshipping Allah and following the teachings of the Quran. Buddhism aims for enlightenment and liberation from the cycle of birth and death. Hinduism stresses living righteously (dharma) and seeking liberation (moksha).
From a biological standpoint, the "meaning" might be seen as survival and reproduction.
Ultimately, the meaning of life is often a deeply personal question where answers vary based on one’s beliefs, values, and individual experiences.}