wrong number of tensors; expected 292, got 291
Hi,
I'm trying to use your models in ollama.
The model creation from the gguf is OK.
But I have the following error on run :
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
Do you have an idea of the origin of this issue ?
Thanks.
you'll have to wait for an update to Ollama, this uses the rope fixes from llama.cpp master branch and breaks backwards compatibility
LM Studio has released 0.2.29 which supports the new modles
Thanks for your answer.
Do you plan to release the 70b too ?
I don't think the author made a 70b
I have had the same error message using llama3.1 from unsloth. I was trying to implement the example from the official site from the unsloth git:
https://github.com/unslothai/unsloth -> https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing
and the code from the youtuber Mervin:https://www.youtube.com/@MervinPraison -> https://mer.vin/2024/07/llama-3-1-fine-tune/
So unsloth was done with conversion and there was no error in both codes by creating the gguf file.
I was trying both, mervins code and the official code to load the gguf from unsloth to ollama, both with the same error:
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
Since unsloth implemented an automation to load llama.cpp by calling their functions, I had no idea what kind of version they loaded.
So I went in the llama.cpp directory (I have linux so it was "cd llama.cpp" - search for the llama.cpp folder in your project of course)
and then I executed: sudo git reset --hard 46e12c4692a37bdd31a0432fc5153d7d22bc7f72
And yes, I was asking chatGPT to help me with that problem. I am very happy, that it is working right now, but developing in this field seems to be not staple for the next years. I hope it will work on your system as well!
Best greetings
Matthias
νΉμ κ·Έλ₯ κΈ°λ€λ €μΌ λλ€λ λ§μμ΄μ 건κ°μ?
ValueError: Error raised by inference API HTTP code: 500, {"error":"llama runner process has terminated: signal: aborted (core dumped) error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291"}
μ λ μ΄λ° μλ¬ νμμ΄ λ°μν΄μμ
Hi, has this issue been solved? I am trying to run Llama 3.1 8b using llama_cpp. I downloaded a number of models (e.g. Meta-Llama-3.1-8B-Instruct-Q6_K.gguf), but keep getting:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: failed to load model
And in the terminal: AttributeError: 'Llama' object has no attribute '_lora_adapter'
Thanks in advance for any help!
Never seen that Lora adapter issue...
Can you share your exact commands?
# ollama run hillct/dolphin-llama-3.1
pulling manifest
pulling b4cc1324cbb5... 100% ββββββββββββββββββ 8.5 GB
pulling 62fbfd9ed093... 100% ββββββββββββββββββ 182 B
pulling 9640c2212a51... 100% ββββββββββββββββββ 41 B
pulling 4fa551d4f938... 100% ββββββββββββββββββ 12 KB
pulling f02dd72bb242... 100% ββββββββββββββββββ 59 B
pulling 67c41d573b3c... 100% ββββββββββββββββββ 559 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model
# ollama --version
ollama version is 0.3.3
# ollama run CognitiveComputations/dolphin-llama3.1:8b-v2.9.4-Q3_K_L
pulling manifest
pulling 33acc6f7959f... 100% ββββββββββββββββββ 4.3 GB
pulling 13584952422b... 100% ββββββββββββββββββ 131 B
pulling 7d9b917757c7... 100% ββββββββββββββββββ 76 B
pulling 94e5d463b8ac... 100% ββββββββββββββββββ 413 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model
# ollama run CognitiveComputations/dolphin-llama3.1:8b-v2.9.4-Q8_0
pulling manifest
pulling b4cc1324cbb5... 100% ββββββββββββββββββ 8.5 GB
pulling 13584952422b... 100% ββββββββββββββββββ 131 B
pulling 7d9b917757c7... 100% ββββββββββββββββββ 76 B
pulling 24f881ee6123... 100% ββββββββββββββββββ 411 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model
This is fixed in latest ollama
ollama run hillct/dolphin-llama-3.1
pulling manifest
pulling b4cc1324cbb5... 100% ββββββββββββββββββ 8.5 GB
pulling 62fbfd9ed093... 100% ββββββββββββββββββ 182 B
pulling 9640c2212a51... 100% ββββββββββββββββββ 41 B
pulling 4fa551d4f938... 100% ββββββββββββββββββ 12 KB
pulling f02dd72bb242... 100% ββββββββββββββββββ 59 B
pulling 67c41d573b3c... 100% ββββββββββββββββββ 559 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model
ollama run CognitiveComputations/dolphin-llama3.1
pulling manifest
pulling c4e04968e3ca... 100% ββββββββββββββββββ 4.7 GB
pulling 13584952422b... 100% ββββββββββββββββββ 131 B
pulling 66112031815b... 100% ββββββββββββββββββ 159 B
pulling e3bd59e71f09... 100% ββββββββββββββββββ 411 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model
ollama --version
ollama version is 0.3.6
$ wget -c https://huggingface.co./bartowski/Llama-3.1-8B-Lexi-Uncensored-GGUF/resolve/main/Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
$ bat ModelFile
ββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β File: ModelFile
ββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β FROM ./Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
2 β
3 β TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
4 β
5 β {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
6 β
7 β {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
8 β
9 β {{ .Response }}<|eot_id|>"""
10 β
11 β PARAMETER num_ctx 16000
12 β PARAMETER stop "<|eot_id|>"
13 β PARAMETER stop "<|start_header_id|>"
14 β PARAMETER stop "<|end_header_id|>"
15 β PARAMETER top_k 1
ββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
$ ollama create llama3.1-lexi -f ModelFile
transferring model data 100%
writing manifest
success
$ ollama run llama3.1-lexi
>>> Who is yavin?
Yavin is a reference to the planet Yavin 4, which is a key location in the original Star Wars film (Episode IV: A New Hope).
However, I'm assuming you might be asking about Yavin as a character. There are actually two characters named Yavin in the Star Wars universe:
1. **Yavin V**: He was a Jedi Master who lived during the time of the Old Republic. Unfortunately, I couldn't find much information on him.
2. **Yavin IV**: This is likely not what you're looking for, as it's just another name for the planet Yavin 4.
However, there is one more possibility:
**Yavin (also known as Yavin the Hutt)**: He was a Hutt crime lord who appeared in the Star Wars Legends universe.
>>> /exit
$ ollama --version
ollama version is 0.3.6
Oh but you are taking about an different model @Yavin5 .....
My answer was about the OP's issue, I realize a lot of answers are completely unrelated in this thread
# wget -c https://huggingface.co./bartowski/Llama-3.1-8B-Lexi-Uncensored-GGUF/resolve/main/Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
Connecting to cdn-lfs-us-1.huggingface.co (cdn-lfs-us-1.huggingface.co)|2600:9000:236b:1e00:17:9a40:4dc0:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4781626464 (4.5G) [binary/octet-stream]
Saving to: βLlama-3.1-8B-Lexi-Uncensored-Q3_K_XL.ggufβ
Llama-3.1-8B-Lexi-U 100%[===================>] 4.45G 33.4MB/s in 2m 30s
2024-08-21 14:09:26 (30.4 MB/s) - βLlama-3.1-8B-Lexi-Uncensored-Q3_K_XL.ggufβ saved [4781626464/4781626464]
# ollama create llama3.1-lexi -f modelfile-melmass
transferring model data 100%
using existing layer sha256:f823cc9ddc2c5c9953a7f1cd171a710128741b15d8023c5ecc2e3808859a27c5
creating new layer sha256:8ab4849b038cf0abc5b1c9b8ee1443dca6b93a045c2272180d985126eb40bf6f
creating new layer sha256:6774f82e80c4d5ffeab1dafd3a4dd0e843ba529edc74273811a567af32402b68
creating new layer sha256:48714da7a6f14bfb596fea79157ed9406e1a51b49792c97bb9519bf7deaa5739
writing manifest
success
# ollama run llama3.1-lexi
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model
# ollama --version
ollama version is 0.3.6
You're still claiming the problem is fixed in the latest version when it apparently isn't.
The "regular" ollama image named "llama3.1:latest" apparently works on this version however:
# ollama run llama3.1:latest
>>> who is melmass?
I couldn't find any notable or well-known person named Melmass. It's
possible that you may have misspelled the name, or it could be a less
common or private individual.
If you could provide more context or information about who Melmass is or
what they are known for, I'd be happy to try and help you further!
this seems impossible, you're both running the same ollama version and both downloaded the same file yet somehow are getting different results??
I assume that you creating the modelfile in a different way isn't affecting it?
# cat modelfile-melmass
FROM ./Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER num_ctx 16000
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER top_k 1
# ls -la Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
-rw-r--r--. 1 root root 4781626464 Jul 28 00:25 Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
Anything else you'd like me to try / output?
i mean, just for the hell of it, sha256sum would be nice but i can't imagine how you're downloading it fresh and ending up with an old file...
I had an older version of ollama initially and faced the OPs issue, while googling I found this issue and a few others on the ollama repo itself, updated ollama and did what I oultined earlier which fixed the issue.
I also donβt get how Yavin still has the issue, the only thing I can think of is that since the sha of the model itself didnβt change it might use an older βcompiledβ model (from an earlier ollama create)
# sha256sum Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
f823cc9ddc2c5c9953a7f1cd171a710128741b15d8023c5ecc2e3808859a27c5 Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
# ls -l Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
-rw-r--r--. 1 root root 4781626464 Jul 28 00:25 Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf