Eval bug: asymmetric layer splitting of reduced models on multiple CUDA GPUs

by recallmenot - opened 18 days ago

18 days ago

I'd like to request your assistance with using this model in llama.cpp.
It appears when using your or tensorblock's version, most of the model is being loaded into CUDA0 instead of being split equally among CUDA0 and CUDA1:
https://github.com/ggerganov/llama.cpp/issues/11132

bartowski

Owner 17 days ago

seems like slaren has a good response for you in the thread that makes sense to me considering the oddities of this model!

recallmenot

16 days ago

yes his answer solved it, thank you!

recallmenot changed discussion status to closed 16 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment