torch.cuda.OutOfMemoryError
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 23.65 GiB total capacity; 5.93 GiB already allocated; 122.56 MiB free; 5.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Hi
@shiwanglai
thanks for the issue ! can you share the snippet you're using ?
@ybelkada
~/nlp/lm-evaluation-harness$ python lm_eval/main.py --model=hf --model_args pretrained=google/gemma-2b,load_in_4bit=True --tasks wikitext --batch_size 1
is going OOM not sure what's going on.
same
~/nlp/lm-evaluation-harness$ python lm_eval/main.py --model=hf --model_args pretrained=google/gemma-2b --tasks wikitext --batch_size 1
same with gemma-7b:
File "/home/vincent/miniconda3/envs/pt2.1.0/lib/python3.11/site-packages/transformers/models/gemma/modeling_gemma.py", line 1088, in forward
logits = logits.float()
^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.81 GiB. GPU 0 has a total capacity of 23.67 GiB of which 6.05 GiB is free. Including non-PyTorch memory, this process has 15.88 GiB memory in use. Of the allocated memory 13.06 GiB is allocated by PyTorch, and 2.52 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
I reduced the max_length but there is still issues with gemma-7b (and gemma-2b is much higher than phi-2)
hf (pretrained=google/gemma-7b,max_length=512), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
wikitext | 2 | none | None | word_perplexity | 42455038.3994 | ± | N/A |
none | None | byte_perplexity | 26.6969 | ± | N/A | ||
none | None | bits_per_byte | 4.7386 | ± | N/A |
hf (pretrained=google/gemma-7b-it,max_length=512), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
wikitext | 2 | none | None | word_perplexity | 1795.5652 | ± | N/A |
none | None | byte_perplexity | 4.0602 | ± | N/A | ||
none | None | bits_per_byte | 2.0216 | ± | N/A |
hf (pretrained=google/gemma-7b,max_length=256), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
wikitext | 2 | none | None | word_perplexity | 41037962.2523 | ± | N/A |
none | None | byte_perplexity | 26.5280 | ± | N/A | ||
none | None | bits_per_byte | 4.7294 | ± | N/A |
hf (pretrained=google/gemma-2b,max_length=512), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
wikitext | 2 | none | None | word_perplexity | 55.9289 | ± | N/A |
none | None | byte_perplexity | 2.1223 | ± | N/A | ||
none | None | bits_per_byte | 1.0857 | ± | N/A |
hf (pretrained=google/gemma-2b-it,max_length=512), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
wikitext | 2 | none | None | word_perplexity | 242.5852 | ± | N/A |
none | None | byte_perplexity | 2.7924 | ± | N/A | ||
none | None | bits_per_byte | 1.4815 | ± | N/A |
Actually, I faced OOM problem when using the DPO trainer for funetuning Gemma-2-2b-it, with 40G memory GPU and batchsize=2. Interesting.