LongT5 fails in FP16 mode
#1
by
ArthurCamara
- opened
Basically what the title says.
If I initiate a T5 model and place it on CPU or CUDA, it works as intended:
from transformers import AutoTokenizer, LongT5EncoderModel
model = LongT5EncoderModel.from_pretrained("google/long-t5-tglobal-base")
tokenizer = AutoTokenizer.from_pretrained("google/long-t5-tglobal-base")
inputs = tokenizer("<extra_id_0> Hello, my dog is cute", return_tensors="pt")
model(**inputs)
Out[19]:
BaseModelOutputWithPastAndCrossAttentions(last_hidden_state=tensor([[[-0.1624, -0.1439, 0.0011, ..., -0.2340, 0.2113, 0.1893],
[ 0.0917, 0.0566, 0.0013, ..., 0.0890, 0.2563, -0.1880],
[ 0.0172, -0.0204, 0.0013, ..., 0.0048, -0.0575, -0.0638],
...,
[-0.0219, -0.0702, 0.0009, ..., -0.0568, -0.0474, 0.0188],
[-0.0876, 0.0266, 0.0008, ..., 0.0385, 0.0675, 0.2390],
[-0.0128, -0.0052, -0.0009, ..., -0.0212, 0.0151, -0.0093]]],
grad_fn=<MulBackward0>), past_key_values=None, hidden_states=None, attentions=None, cross_attentions=None)
That's as expected. The same behaviour happens on a GPU:
## From GPU
model = model.to("cuda")
for k, v in inputs.items():
inputs[k] = v.to("cuda")
model(**inputs)
Out[22]:
BaseModelOutputWithPastAndCrossAttentions(last_hidden_state=tensor([[[-0.1624, -0.1439, 0.0011, ..., -0.2340, 0.2113, 0.1893],
[ 0.0917, 0.0566, 0.0013, ..., 0.0890, 0.2563, -0.1880],
[ 0.0172, -0.0204, 0.0013, ..., 0.0048, -0.0575, -0.0638],
...,
[-0.0219, -0.0702, 0.0009, ..., -0.0568, -0.0474, 0.0188],
[-0.0876, 0.0266, 0.0008, ..., 0.0385, 0.0675, 0.2390],
[-0.0128, -0.0052, -0.0009, ..., -0.0212, 0.0151, -0.0093]]],
device='cuda:0', grad_fn=<MulBackward0>), past_key_values=None, hidden_states=None, attentions=None, cross_attentions=None)
But if call half()
on the model, it only returns nan
s:
model = model.half()
model(**inputs)
Out[24]:
BaseModelOutputWithPastAndCrossAttentions(last_hidden_state=tensor([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], device='cuda:0',
dtype=torch.float16, grad_fn=<MulBackward0>), past_key_values=None, hidden_states=None, attentions=None, cross_attentions=None)
Removing the <extra_id_0>
doesn't really help either.
Any ideas on what is causing this?
Nevermind. Switching to main instead of release solves this.
EDIT: No it didn't =(
Using BF16 solved
ArthurCamara
changed discussion status to
closed
Yup. BF16 solved.