Maybe some tokenizer files are missing?
I have downloaded the model and followed the instructions on https://github.com/lm-sys/FastChat and have gone through the models from lmsys/vicuna-13b-delta-v0. But it won't work for lmsys/vicuna-13b-delta-v1.1 until I add the following files from lmsys/vicuna-13b-delta-v0:
special_tokens_map.json
tokenizer.model
tokenizer_config.json
After that I got screens of messy code...I guess maybe the three correct corresponding files are missing?
Yes, I have the same problem.
OSError: Can't load tokenizer for '/home/lianpengcheng/models/source_models/vicuna-13b-delta-v1.1/'
Hi, the tokenizer files are omitted on purpose because we didn't change any tokenizer. The tokenizer will be the same as LLaMa's tokenizer.
For your problems, please install the latest version of FastChat and apply the delta again. There should be no errors.
Use LLaMa's tokenizer, but still error.
...
param.data += delta.state_dict()[name]
...
RuntimeError: The size of tensor a (32001) must match the size of tensor b (32000) at non-singleton dimension 0
use the latest apply_delta.py from the fastchat repo
Thanks a lot, it works.
Thanks a lot. The tokenizer files from lmsys/vicuna-13b-delta-v0 have no problem and can be directly used.
Finally I found it was my mistake to omit the hint "NOTE: This "delta model" cannot be used directly.".
My problem has been addressed after using the models from https://huggingface.co./eachadea/vicuna-13b-1.1 .That model can be directly used.
Can you provide the merged version(with llama version) instead of just the incremental version?
Can you provide the merged version(with llama version) instead of just the incremental version?
They can't provide a merged version due to the Llama license terms. But I've merged it and it's available here: https://huggingface.co./TheBloke/vicuna-13B-1.1-HF
Help..
return self._apply(lambda t: t.cuda(device))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB (GPU 0; 6.00 GiB total capacity; 5.27 GiB already allocated; 0 bytes free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF