Fast inference engine

#2
by SinanAkkoyun - opened

Hello,
I understand why you can't use Llama, but please work on a vLLM PR when dropping a new architecture like DeepSeek does

Thank you

Sign up or log in to comment