LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct · No response as a result of model inference

Aug 13, 2024

I operated the sample code you provided.
However, the results do not seem to come out normally.

My Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")

prompt = "너의 소원을 말해봐" # Korean example

messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)

print(tokenizer.decode(output[0]))

My Result

[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]Explain who you are
[|assistant|][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]더[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]더[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]

What should I check?

yireun

LG AI Research org Aug 13, 2024

•

edited Aug 13, 2024

Have you checked the version of transformers? EXAONE-3.0-7.8B-Instruct requires transformers v4.41 or later.
The results doesn't match your code. The prompt "너의 소원을 말해봐" is used in code but "Explain who you are" is printed in result

whwnsgh

Aug 13, 2024

•

edited Aug 13, 2024

@yireun

Sorry Content sharing is invalid.

Transformer Version
(Whisper) B220256@AI-ML-02:~/LLM/Whisper$ python
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import transformers
transformers.version
'4.43.3'

My transformers Library Version is 4.43.3.
(v4.41 or later.)

The prompt "Tell me your wish" is correct I am sharing the correct code again.

My Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")

prompt = "너의 소원을 말해봐" # Korean example

messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)

print(tokenizer.decode(output[0]))

My Result

[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]너의 소원을 말해봐
[|assistant|][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]ETS[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]

Appendix

If i use "explain who you are" at the prompt, it's the same result.
if i use "hello" or "안녕" , it's same result

yireun

LG AI Research org Aug 13, 2024

Would you try to add trust_remote_code=True in AutoTokenizer.from_pretrained() as follows?

tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)

Could you tell me the version of the followings:
- torch
- flash-attn
- accelerate
- cuda

whwnsgh

Aug 13, 2024

It's still the same symptom.

My Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)

prompt = "너의 소원을 말해봐" # Korean example

messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)

print(tokenizer.decode(output[0]))

My Result

(Whisper) B220256@AI-ML-02:~/LLM/Whisper$ python lg.py
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 7/7 [02:04<00:00, 17.76s/it]
We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.
[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]너의 소원을 말해봐
[|assistant|][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]ETS[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]

This is my Lib Version
import torch
import accelerate
import flash_attn

torch.version, accelerate.version, flash_attn.version

('2.3.1+cu121', '0.33.0', '2.6.3')

yireun

LG AI Research org Aug 13, 2024

Because I don't have RTX 4000 series, I tested the code on AWS EC2 g6.2xlarge instance and the code worked normally.

AWS g6.2xlarge, which has a NVIDIA L4 GPU with Ada Lovelace Architecture
- Ubuntu 22.04.4 LTS
- Python 3.10.12
- torch 2.3.1+cu121
- transformers 4.43.3
- accelerate 0.33.0
- NO flash-attn because install error occurs

whwnsgh

Aug 13, 2024

Thank you for your help.

I ran it on ipynb, but I couldn't check the detailed log.

When I ran it with .py, I checked the log below.

We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.

It worked normally when I operated it in cpu.

Let's upgrade the gpu driver.