Unable to load the model
I used the steps (as specified in the documentation) below to load the model. But it complains of the following error: 'stanford-oval/Llama-2-7b-WikiChat does not appear to have a file named pytorch_model-00001-of-00002.bin. ' Appreciate any inputs on this
Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("stanford-oval/Llama-2-7b-WikiChat")
model = AutoModelForCausalLM.from_pretrained("stanford-oval/Llama-2-7b-WikiChat")
I tested this model with both HuggingFace's TGI (https://github.com/huggingface/text-generation-inference) and vLLM (https://github.com/vllm-project/vllm) and it works just fine.
I'm not sure why it doesn't work directly using transformers
. We normally don't test with that because it is much slower at inference.
OK, there seems to have been an issue when converting model weights to the .safetensors
format. Apparently, TGI and vLLM don't rely on model.safetensors.index.json
, but the transformers
library does.
I've fixed both models and they should work with transformers
now.