I'm trying to run this in Inference Endpoints but keep getting this error...

by GollyJer - opened 1 day ago

1 day ago

What is the correct setup in the Inference Endpoints UI? Thanks!

message: "Server error: The size of tensor a (16) must match the size of tensor b (32) at non-singleton dimension 0"
target: "text_generation_router_v3::client"
filename: "backends/v3/src/client/mod.rs"
line_number: 45
span: {"name":"warmup"}
spans: [{"max_batch_size":"None","max_input_length":"None","max_prefill_tokens":4096,"max_total_tokens":"None","name":"warmup"},{"name":"warmup"}]

GollyJer

1 day ago

I found some other discussion about this... https://github.com/huggingface/text-generation-inference/issues/2879

Setting CUDA_GRAPHS to 0 did the trick.

GollyJer changed discussion status to closed 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment