Input validation error: `max_new_tokens` must be <= 1. Given: 20
#12
by
reubenlee3
- opened
I'm using the inference endpoint to quickly get one model up and running to test it. Below is my container configuration, but I can't seem to make heads or tails of the error: Input validation error: max_new_tokens
must be <= 1. Given: 20. I'm using the GPU medium (Nvidia A10G) as my instance type.
Any thoughts?
Hi
@reubenlee3
, I think you have to set the "Max Number of Tokens (per Query)" to Max Input Length (per Query)
+ max_new_tokens
-- let me know if that solves the issue!