Spaces:
Running
HuggingChat: Input validation error: `inputs` tokens + `max_new_tokens` must be..
I use the meta-llama/Meta-Llama-3-70B-Instruct model. After a certain number of moves, the AI refuses to walk and gives an error : "Input validation error: inputs
tokens + max_new_tokens
must be <= 8192. Given: 6391 inputs
tokens and 2047 max_new_tokens
". Is this a bug or some new limitation? I still don't get it to be honest and I hope I get an answer here. I'm new to this site.
Same issue all of the sudden today
Can you see if this still happens? Should be fixed now.
I keep getting this error as well. Using CohereForAI
Same error, "Meta-Llama-3-70B-Instruct" model.
I have also been running into this error. Is there a workaround or solution at all?
"Input validation error: inputs
tokens + max_new_tokens
must be <= 8192. Given: 6474 inputs
tokens and 2047 max_new_tokens
"
Using the meta-llama/Meta-Llama-3-70B-Instruct model.
Thanks a lot for the detailed how-to guide, JulienGuy. Appreciate it!
Hi everyone! I'm getting this message: Input validation error: inputs
tokens + max_new_tokens
must be <= 16384. Given: 16392 inputs
tokens and 0 max_new_tokens
Do you know what's going on? I've been using both "Qwen/Qwen2.5-72B-Instruct" and "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B".
Can you please tell me what's going on?
Hi everyone! I'm getting this message too: Input validation error: inputs
tokens + max_new_tokens
must be <= 16000. Given: 14698 inputs
tokens and 3072 max_new_tokens
I've been using Qwen/Qwen2.5-Coder-32B-Instruct,,,
How to fix it?
@datoreviol @bocahpekael99 if one of you could share one of the conversations where this happen, that would help us a lot with debugging !
Hi guys,
LLMs have a limited context window, that is, a limited amount of text they can process at once. If this limit is exceeded, you typically get the error you are seeing. The limit in your case is around 16k tokens.
What counts towards this limit is the input text PLUS the output text. The input text is your prompt, which may contain a lot of tokens if you are doing RAG (almost 15k in your case). The output text is what you are asking the LLM to generate as an answer, which is 3072 tokens in your case. So you are basically asking the model to process more text at once than it is able to.
To fix the error, you have to reduce the amount of text you are asking the LLM to process. You can use any of these approaches:
- reduce the input size (write a shorter prompt, return fewer chunks from your database if you are doing RAG, have smaller chunks to begin with)
- reduce the size of the answer you want the LLM to write. 3072 tokens is kind of a lot for a chatbot or a RAG pipeline, do you really need that much? Try 1024 or 512.
- when calling an LLM through some kind of free API from HuggingFace, it seems that the max context window is set to a lower value than what the model can actually deal with. If this is how you are using DeepSeek, consider creating a (paid) dedicated endpoint instead, which would allow you to use a bigger context window (Qwen 32B should support 128k).
Hope this helps.
TBH this shouldn't be happening, the backend should automatically truncate if you exceed the context window, that's why I wanted a conversation to see where the issue is
What I don't understand is the following:
When input limit is reached how much do we have to wait in order to continue asking questions to the model / agent ??
Still happening to me