model breaks down at high context

by FlareRebellion - opened Jul 28, 2024

Discussion

FlareRebellion

Jul 28, 2024

tested the Q4_K_S at 24k context and it's entirely incoherent.

No such problems with https://huggingface.co./NeverSleep/Lumimaid-v0.2-70B-GGUF

mradermacher

Owner Jul 28, 2024

•

edited Jul 28, 2024

Well, that url does not even have a Q4_K_S, so not sure what you are comparing against. But that sounds more like download corruption or the LLM having a bad day/bad settings rather than an issue with these quants.

mradermacher changed discussion status to closed Jul 28, 2024

nicoboss

Jul 28, 2024

@mradermacher Are you sure this is not related to the Llama 3.1 rope scaling issue? The rope scaling issue especially affects context windows above 8192. Based on the upload time it seems as if this Meta-Llama-3.1-70B-Instruct based model was quantized with the old llama.cpp version and needs to be requantized with the new one for the rope scaling issue to be fixed.

mradermacher

Owner Jul 28, 2024

Are you talking about this repo or another one? The files here are clearly more recent than in the linked repo, so I don't understand what you are refering to. AFAIK, I have not uploaded any 3.1 quants before the 3.1 scaling was implemented in llama-3.1, unless explicitly requested by the author.

nicoboss

Jul 28, 2024

Sorry for the confusion. I thought you started using the new llama.cpp version at 27th July 2024 at 22:35 GMT when you mentioned that you (re-)queued everything but I overlooked that in the same comment you also mentioned that you already converted a few models first for testing. I assume this must be one of those test models. The new llama.cpp version got released on 27th July 2024 at 14:07 GMT while you started with Lumimaid-v0.2-70B on 27th July 2024 at 14:42 GMT so assuming you immediately updated this model is perfectly fine.

mradermacher

Owner Jul 28, 2024

Ah, that explains it. lumimaid was indeed the first model (models, really) i queued. You can see whether the new rope implementation is in effect by clicking on a quant in huggingface (right side panel) and looking for the rope_freqs tensor - if it's there, it was converted with the new code.

mradermacher

Owner Jul 28, 2024

For what it's worth, I downloaded the IQ3_XXS, and even at 29k tokens, it was completely coherent. Well, as coherent as with smaller contexts.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment