Quantum Entanglement and the Sentient Toaster: Revolutionizing LLM Training
I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.
-rw------- 1 root root 509G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf
I assume that is in GB and not GiB. In which case 474 GiB might fit as we have 503 GiB of RAM (after subtracting RAM reserved for hardware) but would be extremely tight given the RAM required for context.
I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.
Q6_K is fine for me. Q8_0 might not fit without offloading and it is unclear if offloading is even possible. I don't think it's worth using RPC if Q6_K fits. As a bonus there will be enough RAM left to let quantization tasks running if we do Q6_K. If you already have Q8_0 locally you should give it a try and see if it fits but if not Q6_K is fine for me.
I just checked and you do have it locally under /tmp/snowflake-arctic-instruct.Q8_0.gguf
so please give it a try to see if it fits. I believe it should fit if nothing else is running as the model has such a small number of layers. If it doesn't fit use Q6_K instead.
474G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf
I'll try an offload of 1 and 0, then Q6. hopefully it does not crash.
I think you have to finish or kill the frozen quantisation tasks first. They are using a lot of reserved RAM (not cached RAM that can be taked away).
So, despite it listing both cpus, it only allocated something on cpu 0 (19GB). Otherwise, top says the process uses 435.6g, which is good, because I forgot to resume/stop the running quantize. I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.
457.4g after warming up.
So, despite it listing both GPUs, it only allocated something on GPU0 (19GB)
llama.cpp uses booth GPUs for imatrix but only offloaded to one because you set -ngl 1
and it can only offload on a per-layer bases. Also ince when are quantisation tasks using the GPUs?
I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.
I'm not so sure about that. Keep in mind that imatrix uses mmap memory that can be taken away by other processes like quantisation tasks that use reserved memory.
dstat shows a relatively high disk read rate so imatrix might now be streaming from SSD:
Yes it is clearly streaming from SSD now:
Once the quantisation tasks are interrupted it should work without SSD streaming again.
Addendum: you can probably tell by now that I am a staunch anti-neoliberalist and work for a tiny, very personal company for a reason :) Don't worry, I am also a realist :)
@mradermacher The status page(http://hf.tst.eu/status.html) is frozen since 2024-12-20 16:05:00+0100 and booth nico1 and rich1 are idle. There no longer seam any models to be uploaded so I assume something critical broke and I don't think there is anything I can do to fix it.
I checked kernel log on StromPeak and the time it broke seams to somewhat allign to the time my RTX 3080 GPU crashing but that is not used by nico1 as only the RTX 4090 GPUs are assigned to your LXC container and so should not be related:
Dec 20 15:55:19 StormPeak kernel: NVRM: GPU at PCI:0000:c1:00: GPU-c8fe94f9-541b-e16b-da0f-b8d38ea5283e
Dec 20 15:55:19 StormPeak kernel: NVRM: Xid (PCI:0000:c1:00): 62, pid='<unknown>', name=<unknown>, 2027f626 2027f426 2027fcf4 20288f2a 20288e30 2021b5b8>
Dec 20 15:55:24 StormPeak kernel: NVRM: GPU 0000:c1:00.0: RmInitAdapter failed! (0x62:0x55:2477)
Dec 20 15:55:24 StormPeak kernel: NVRM: GPU 0000:c1:00.0: rm_init_adapter failed, device minor number 0
(...)
Dec 20 15:58:48 StormPeak kernel: INFO: task nv_open_q:2903 blocked for more than 122 seconds.
Dec 20 15:58:48 StormPeak kernel: Tainted: P O 6.8.12-5-pve #1
Dec 20 15:58:48 StormPeak kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 20 15:58:48 StormPeak kernel: task:nv_open_q state:D stack:0 pid:2903 tgid:2903 ppid:2 flags:0x00004000
(...)
Dec 20 15:58:48 StormPeak kernel: INFO: task nvidia-smi:2356875 blocked for more than 122 seconds.
Dec 20 15:58:48 StormPeak kernel: Tainted: P O 6.8.12-5-pve #1
Dec 20 15:58:48 StormPeak kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 20 15:58:48 StormPeak kernel: task:nvidia-smi state:D stack:0 pid:2356875 tgid:2356875 ppid:2341557 flags:0x00004006
(...)
Dec 20 16:00:50 StormPeak kernel: INFO: task nv_queue:2901 blocked for more than 245 seconds.
Dec 20 16:00:50 StormPeak kernel: Tainted: P O 6.8.12-5-pve #1
Dec 20 16:00:50 StormPeak kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 20 16:00:50 StormPeak kernel: task:nv_queue state:D stack:0 pid:2901 tgid:2901 ppid:2 flags:0x0000400
After more carefully reviewing the kernel log it indeed seams that nico1 got somehow affected by the issue with the RTX 3080 GPU:
Dec 20 15:58:48 StormPeak kernel: INFO: task llama-quantize:2364235 blocked for more than 122 seconds.
Dec 20 15:58:48 StormPeak kernel: Tainted: P O 6.8.12-5-pve #1
Dec 20 15:58:48 StormPeak kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 20 15:58:48 StormPeak kernel: task:llama-quantize state:D stack:0 pid:2364235 tgid:2364235 ppid:1469293 flags:0x0000000
llama-quantize should not use any GPU and the faulty GPU is not even attached to your LXC container so really strange this happened. There are tasks running so not sure if the system is in a state where it can tolerate a reboot of nico1 but it currently is not working at all so it likely can't get any worse. It would be really interesting to know how a stuck quantize task on nico1 brought the entire system to a halt.
I disconnected nico1 from the internet but still kept it running. Let's see if that is enough for the system to fix itself. All other hosts should now detect nico1 as offline and hopefully manage to recover.
It didn't help. I will reboot StormPeak now but unlikely that fixes anything as even without nico1 the system didn't recover.
I rebooted StormPeak which fixed the RTX 3080 issue and started nico1 again but as expected this unfortunately didn't fix whatever issue brought the entire system to a halt.
Good morning. I don't know what happened. A llama-quantize should hang the job only, but maybe something else also went wrong. The connection timeout (once established) is currently 3600 seconds, but that either didn't trigger or somehow it happened multiple runs of the scheduler. rich1 is also gone at the moment, which might play a role as well.
I also disabled the local scheduler a week or so ago because there is some weird bug where static jobs finish successfully within 10 seconds without doing anything, meaning static quants are not generated at all, so that didn't help either.
Obviously, there is a bug somewhere.
Since I am not in such great shape still, I opted to kill all processes holding loocks and this got it going again, but without post-mortem. We'll have to do this a few more times, I guess, to find the issue ...
Don't know if I can do it, but I plan to queue more models before the queue dries out -. otherwise, I'll have to tell richard that soon his box will be idle and needs to be taken over, and then a short time later, I will beg to get exclusive access again :)
In other news, my main home server (that I need for breathing and basic survial, figuratively speaking :) is restore to a state where I can actually use it in read-write again. Doesn't mean much to you, but the last weeks were... unpleasant, I practically couldn't do anything.
And if we don't talk to each other much, merry christmas and a happy new year :)