Spaces:

huggingchat
/

chat-ui

Running

App Files Files Community

678

HuggingChat: Input validation error: `inputs` tokens + `max_new_tokens` must be..

#430

by Kostyak - opened Apr 29, 2024

Discussion

Kostyak

Apr 29, 2024

•

edited Apr 29, 2024

I use the meta-llama/Meta-Llama-3-70B-Instruct model. After a certain number of moves, the AI refuses to walk and gives an error : "Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6391 inputs tokens and 2047 max_new_tokens". Is this a bug or some new limitation? I still don't get it to be honest and I hope I get an answer here. I'm new to this site.

Kostyak changed discussion status to closed Apr 29, 2024

Kostyak changed discussion title from Input validation error: `inputs` tokens + `max_new_tokens` must be.. to HuggingChat: Input validation error: `inputs` tokens + `max_new_tokens` must be.. Apr 29, 2024

Kostyak changed discussion status to open Apr 29, 2024

Kostyak changed discussion status to closed Apr 29, 2024

Kostyak changed discussion status to open Apr 29, 2024

Awillia91

Apr 29, 2024

Same issue all of the sudden today

nsarrazin

Hugging Chat org Apr 30, 2024

Can you see if this still happens? Should be fixed now.

Kostyak

Apr 30, 2024

This comment has been hidden

Kostyak

Apr 30, 2024

•

edited Apr 30, 2024

Can you see if this still happens? Should be fixed now.

Still same error, except numbers have changed a little.

dashtheman

May 1, 2024

I keep getting this error as well. Using CohereForAI

SnowfieldTerm

May 1, 2024

Same error, "Meta-Llama-3-70B-Instruct" model.

keaparrottg

May 2, 2024

I have also been running into this error. Is there a workaround or solution at all?

"Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6474 inputs tokens and 2047 max_new_tokens"

Using the meta-llama/Meta-Llama-3-70B-Instruct model.

74 hidden messages

Expand all

pineapple96

Sep 4, 2024

Thanks a lot for the detailed how-to guide, JulienGuy. Appreciate it!

datoreviol

24 days ago

Hi everyone! I'm getting this message: Input validation error: inputs tokens + max_new_tokens must be <= 16384. Given: 16392 inputs tokens and 0 max_new_tokens

Do you know what's going on? I've been using both "Qwen/Qwen2.5-72B-Instruct" and "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B".

Can you please tell me what's going on?

bocahpekael99

16 days ago

Hi everyone! I'm getting this message too: Input validation error: inputs tokens + max_new_tokens must be <= 16000. Given: 14698 inputs tokens and 3072 max_new_tokens

I've been using Qwen/Qwen2.5-Coder-32B-Instruct,,,

How to fix it?

nsarrazin

Hugging Chat org 16 days ago

@datoreviol @bocahpekael99 if one of you could share one of the conversations where this happen, that would help us a lot with debugging !

JulienGuy

15 days ago

@datoreviol @bocahpekael99

Hi guys,

LLMs have a limited context window, that is, a limited amount of text they can process at once. If this limit is exceeded, you typically get the error you are seeing. The limit in your case is around 16k tokens.

What counts towards this limit is the input text PLUS the output text. The input text is your prompt, which may contain a lot of tokens if you are doing RAG (almost 15k in your case). The output text is what you are asking the LLM to generate as an answer, which is 3072 tokens in your case. So you are basically asking the model to process more text at once than it is able to.

To fix the error, you have to reduce the amount of text you are asking the LLM to process. You can use any of these approaches:

reduce the input size (write a shorter prompt, return fewer chunks from your database if you are doing RAG, have smaller chunks to begin with)
reduce the size of the answer you want the LLM to write. 3072 tokens is kind of a lot for a chatbot or a RAG pipeline, do you really need that much? Try 1024 or 512.
when calling an LLM through some kind of free API from HuggingFace, it seems that the max context window is set to a lower value than what the model can actually deal with. If this is how you are using DeepSeek, consider creating a (paid) dedicated endpoint instead, which would allow you to use a bigger context window (Qwen 32B should support 128k).

Hope this helps.

nsarrazin

Hugging Chat org 15 days ago

TBH this shouldn't be happening, the backend should automatically truncate if you exceed the context window, that's why I wanted a conversation to see where the issue is

dcardoner

11 days ago

What I don't understand is the following:
When input limit is reached how much do we have to wait in order to continue asking questions to the model / agent ??

rayd475

5 days ago

Still happening to me

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment