Issue with starting "Start-WebUI" bat.

#24
by Miltank - opened

Hi! Was using new AI to test it out. I was starting it all the time by "start-webui-vicuna-gpu" bat, and it Worked, but has a problem with "CUDA out of memory" that I fixed somehow. Then I decided to start "Start-WebUI" bat, and it happened:

Starting the web UI...
Warning: --cai-chat is deprecated. Use --chat instead.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\bin\cudart64_110.dll
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll...
Loading anon8231489123_vicuna-13b-GPTQ-4bit-128g...
Auto-assiging --gpu-memory 7 for your GPU to try to prevent out-of-memory errors.
You can manually set other values.
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 442, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\pytorch_model-00001-of-00003.bin'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\anton\Desktop\AI\oobabooga-windows\text-generation-webui\server.py", line 347, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\text-generation-webui\modules\models.py", line 171, in load_model
model = AutoModelForCausalLM.from_pretrained(checkpoint, **params)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 2736, in from_pretrained
) = cls._load_pretrained_model(
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 3050, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 445, in load_state_dict
with open(checkpoint_file) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\pytorch_model-00001-of-00003.bin'
To continue press any key...

Someone can help me to fix that? I was downloading custom model using "download-model" bat file.

In your start-webui.bat try and change this line:

call python server.py --auto-devices --cai-chat

To:

call python server.py --auto-devices --chat --wbits 4 --groupsize 128

--cai-chat is deprecated

Hi! It started, yup. But now it has a problem with "CUDA out of memory", Like before. I remember that "Start-Webui" bat was saying "Auto-assiging --gpu-memory 7 for your GPU to try to prevent out-of-memory errors." before I changed this setting. What I should do now?

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\bin\cudart64_110.dll
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll...
Loading anon8231489123_vicuna-13b-GPTQ-4bit-128g...
Found the following quantized model: models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors
Loading model ...
C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\safetensors\torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(filename, framework="pt", device=device) as f:
C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = cls(wrap_storage=untyped_storage)
Done.
Loaded the model in 11.14 seconds.
Loading the extension "gallery"... Ok.
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Traceback (most recent call last):
File "C:\Users\anton\Desktop\AI\oobabooga-windows\text-generation-webui\modules\callbacks.py", line 66, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\text-generation-webui\modules\text_generation.py", line 228, in generate_with_callback
shared.model.generate(**kwargs)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
return self.sample(
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
outputs = self(
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
outputs = self.model(
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 289, in forward
hidden_states = self.input_layernorm(hidden_states)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\anton\Desktop\AI\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 84, in forward
variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.08 GiB already allocated; 0 bytes free; 7.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 0.89 seconds (0.00 tokens/s, 0 tokens, context 983, seed 1850395684)

Can someone help me with this? I really want to test this out on RTX3070TI, but It's too slow/not working sometimes.

Sign up or log in to comment