Quantum Entanglement and the Sentient Toaster: Revolutionizing LLM Training
I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.
-rw------- 1 root root 509G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf
I assume that is in GB and not GiB. In which case 474 GiB might fit as we have 503 GiB of RAM (after subtracting RAM reserved for hardware) but would be extremely tight given the RAM required for context.
I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.
Q6_K is fine for me. Q8_0 might not fit without offloading and it is unclear if offloading is even possible. I don't think it's worth using RPC if Q6_K fits. As a bonus there will be enough RAM left to let quantization tasks running if we do Q6_K. If you already have Q8_0 locally you should give it a try and see if it fits but if not Q6_K is fine for me.
I just checked and you do have it locally under /tmp/snowflake-arctic-instruct.Q8_0.gguf
so please give it a try to see if it fits. I believe it should fit if nothing else is running as the model has such a small number of layers. If it doesn't fit use Q6_K instead.
474G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf
I'll try an offload of 1 and 0, then Q6. hopefully it does not crash.
I think you have to finish or kill the frozen quantisation tasks first. They are using a lot of reserved RAM (not cached RAM that can be taked away).
So, despite it listing both cpus, it only allocated something on cpu 0 (19GB). Otherwise, top says the process uses 435.6g, which is good, because I forgot to resume/stop the running quantize. I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.
457.4g after warming up.
So, despite it listing both GPUs, it only allocated something on GPU0 (19GB)
llama.cpp uses booth GPUs for imatrix but only offloaded to one because you set -ngl 1
and it can only offload on a per-layer bases. Also ince when are quantisation tasks using the GPUs?
I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.
I'm not so sure about that. Keep in mind that imatrix uses mmap memory that can be taken away by other processes like quantisation tasks that use reserved memory.
dstat shows a relatively high disk read rate so imatrix might now be streaming from SSD:
Yes it is clearly streaming from SSD now:
Once the quantisation tasks are interrupted it should work without SSD streaming again.
@RichardErkhov about 40000 of them or so. you can see for yourself soon, for real.
and totally off-topic, i just looked at the safentensors format, and i quite like it (other than being somewhat sloppily defined). but there seems to be no concern for alignment of any kind - according to the spec, tensors cannot be aligned to their data size.
In case you wonder what caused the 9-hour internet outage this morning: Threadripper is the node hosting the OpenWrt router. The PSU of Threadripper caught fire while I was asleep and it burned until it shorted the high voltage part which triggered the breakers. This morning the entire house was filled with the smoke of burned electronics and plastic and it still smells absolutely terrible. That wasn't a cheap PSU at all. It was a be quiet! Dark Power Pro which back then costed around $500 and so for sure a high-end PSU which lasted for 8 years 24/7 use before it catastrophically failed.
Regarding the damages: The PSU which started the fire is completely dead as multiple components inside melted together. The RAM and RTX 2070 GPU might be dead as Threadripper no longer boots if any of them are plugged in but further evaluation is required. The mainboard might be partially damaged as it only works when the PC is laying on the floor and constantly gets stuck either detecting RAM or during hardware initialization and only works in safe mode. When I checked journalctl during the PSU fire event I saw multiple HDDs reporting that they overheated despite being idle so I will have to check if they are fine as well.
Threadripper is now online for 10 hours again and looks somewhat stable. I temporary gave it the PSU from StormPeak and some older RAM 2x16 GB of RAM and an old GTX 770 GPU and have it lying flat on the floor which somehow got it to boot but no idea how long that will last. Tomorrow evening, I will start investigating what hardware is still alive. I will likely do so by putting them in a different PC so disruptions to nico1 should be minimal. I also enabled replication of the OpenWrt router VM to all the other nodes so should it break completely I could use CastlePeak as new router.
uhm, oh, wow. I assumed you wanted to switch gpus or so (because nico1 was paused), but the duration indeed made me wonder.
be quiet are not cheap, but very low quality, IMnsHO. Shouldn't catch fire, though. But it is just china crap. That's mostly my unadultered opinion - I lost all my four be quiet psus I had in my life with (small) fireworks, but no fire. It all made sense when I saw reviews opening them up and finding boards marked 150W in 300W psus. Not that this is any use to you... it can happen with any psu.
However, I can understand your situation - I once lost my apartment because an sgi indy caught fire due t a faulty PSU - a few months before sgi published an advisory on that issue. Despite being off! With the knocker switch! Destroyed my whole library (hundreds of books). mm-thick soot on all walls. And of course my beloved HP workstation, my monitor, and PC. It was quite traumatic. Not as traumatic as thieves then stealing most of what survived form the open apartment, and the police refusing to watch the security cameras for who did it. That was absolutely traumatic. Had to move out for months. Fortunately, my cat wasn't there at the time... Oh, and my data survived. But I wasn't back online as quick as you were....
All considered, while this sucks, you should consider yourself super lucky - much more worse have happened.
As for HDDs, I have more than a dozen of smart failing harddisks that had considerable overtemperature at one time. The ones that fail, however, never announce it before it happens. But maybe they have been cooked at >>100°... they will not like that. I'd be more concerned about any soot, for non-helium drives. That can cause problems to develop in weeks or months, from my own experience. But there is little you can do other than replace or back up.
Also, your priorities... nico1 was back up in record time, and I trust you have your priorities set correctly. Still quanting isn't that important as, you know, your home...
@RichardErkhov I finally have a list for you.
One reason it took so long is that I lost confidence is the meaning of the list, and the selection criteria - it sounded so good when I proposed it, now I am not so sure it is of much use. And it was rather more work to extract than I thought.
But it is on rich1:/202402-202501.txt
It's a list of urls, sorted by, uhm, importance (more "important" creators are listed first).
It is essentially the list of models I have been shown, minus the ones I tried to quantize. Many of them do not exist anymore, some have been renamed and/or recreatred, in which case both urls are listed.
It is highly filtered, according to way too many criteria to list, but the important ones are:
- models should have a chance of being complete(e.g. they should have a config.json file and some tensor files)
- they are not by "obvious" datahoarder uploaders, or known assholes
- they are (on paper) supported by convert_hf_to_gguf.py
- they do not look like obvious quantized models (but still many are)
The list contains the 11 months from feb 2024 till end of 2024 and contains 22235 urls.
This should give you a list where a high concentration of models are not obvious crap, but that were skipped by me, mostly based on not havingh a nice enough name.
Feedback appreciated.
Oh wow, thank you @mradermacher ! I will check it later, I have school right now. Scary and sorry to hear abour your library
@RichardErkhov my library burned 20 years ago. nicos house almost burned yesterday!
I assumed you wanted to switch gpus or so (because nico1 was paused)
I only paused the GPUs because I ran reasoning finetunes overnight and did not pause the nico1 host. Shortly after losing Threadripper the quant tasks unfortunately stopped due to a lack of internet despite there being enough storage. I assume because local scheduling is currently disabled. But it might be better that as even with this small backlog once internet was restored it started using so many connections maxing out the 10 Gbit/s connection that for 10 minutes during which the internet was unusable slow for anyone else.
be quiet are not cheap, but very low quality
It all made sense when I saw reviews opening them up and finding boards marked 150W in 300W psus.
What shady business practices. Do you have any recommendations for a good PSU that is at least 1200 Watt? I now obviously need a new one.
All considered, while this sucks, you should consider yourself super lucky - much more worse have happened.
I for sure was extremely lucky. The smoke was quite dangerous, and the entire house could have burned down.
Despite being off! With the knocker switch!
How is that even possible? I thought the switch physically disconnects the power.
Also, your priorities... nico1 was back up in record time, and I trust you have your priorities set correctly. Still quanting isn't that important as, you know, your home...
Getting Threadripper working again was not just a priority because of nico1. It hosts my router and having no internet is quite a big issue considering that I'm working from home most days. But I agree it probably wasn't the smartest idea to carry it out of a smoke-filled basement and spending the entire day working on it outdoors. But realistically there is not much else I could have done in the meantime anyways while waiting for the smoke to get blown out the basement windows.