Eval bug: asymmetric layer splitting of reduced models on multiple CUDA GPUs

#3
by recallmenot - opened

I'd like to request your assistance with using this model in llama.cpp.
It appears when using your or tensorblock's version, most of the model is being loaded into CUDA0 instead of being split equally among CUDA0 and CUDA1:
https://github.com/ggerganov/llama.cpp/issues/11132

seems like slaren has a good response for you in the thread that makes sense to me considering the oddities of this model!

yes his answer solved it, thank you!

recallmenot changed discussion status to closed

Sign up or log in to comment