I'm trying to run this in Inference Endpoints but keep getting this error...
#8
by
GollyJer
- opened
What is the correct setup in the Inference Endpoints UI? Thanks!
message: "Server error: The size of tensor a (16) must match the size of tensor b (32) at non-singleton dimension 0"
target: "text_generation_router_v3::client"
filename: "backends/v3/src/client/mod.rs"
line_number: 45
span: {"name":"warmup"}
spans: [{"max_batch_size":"None","max_input_length":"None","max_prefill_tokens":4096,"max_total_tokens":"None","name":"warmup"},{"name":"warmup"}]
I found some other discussion about this... https://github.com/huggingface/text-generation-inference/issues/2879
Setting CUDA_GRAPHS
to 0
did the trick.
GollyJer
changed discussion status to
closed