No response as a result of model inference
I operated the sample code you provided.
However, the results do not seem to come out normally.
My Code
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")
prompt = "λμ μμμ λ§ν΄λ΄" # Korean example
messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)
print(tokenizer.decode(output[0]))
My Result
[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]Explain who you are
[|assistant|][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]λ[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]λ[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]
What should I check?
Have you checked the version of transformers? EXAONE-3.0-7.8B-Instruct requires transformers v4.41 or later.
The results doesn't match your code. The prompt "λμ μμμ λ§ν΄λ΄" is used in code but "Explain who you are" is printed in result
Sorry Content sharing is invalid.
- Transformer Version
(Whisper) B220256@AI-ML-02:~/LLM/Whisper$ python
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import transformers
transformers.version
'4.43.3'
My transformers Library Version is 4.43.3.
(v4.41 or later.)
- The prompt "Tell me your wish" is correct I am sharing the correct code again.
My Code
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")
prompt = "λμ μμμ λ§ν΄λ΄" # Korean example
messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)
print(tokenizer.decode(output[0]))
My Result
[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]λμ μμμ λ§ν΄λ΄
[|assistant|][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]ETS[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]
Appendix
- If i use "explain who you are" at the prompt, it's the same result.
- if i use "hello" or "μλ " , it's same result
Would you try to add trust_remote_code=True in AutoTokenizer.from_pretrained() as follows?
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)
Could you tell me the version of the followings:
- torch
- flash-attn
- accelerate
- cuda
- It's still the same symptom.
My Code
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)
prompt = "λμ μμμ λ§ν΄λ΄" # Korean example
messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)
print(tokenizer.decode(output[0]))
My Result
(Whisper) B220256@AI-ML-02:~/LLM/Whisper$ python lg.py
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 7/7 [02:04<00:00, 17.76s/it]
We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.
[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]λμ μμμ λ§ν΄λ΄
[|assistant|][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]ETS[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]
- This is my Lib Version
import torch
import accelerate
import flash_attn
torch.version, accelerate.version, flash_attn.version
('2.3.1+cu121', '0.33.0', '2.6.3')
Because I don't have RTX 4000 series, I tested the code on AWS EC2 g6.2xlarge instance and the code worked normally.
- AWS g6.2xlarge, which has a NVIDIA L4 GPU with Ada Lovelace Architecture
- Ubuntu 22.04.4 LTS
- Python 3.10.12
- torch 2.3.1+cu121
- transformers 4.43.3
- accelerate 0.33.0
- NO flash-attn because install error occurs
Thank you for your help.
I ran it on ipynb, but I couldn't check the detailed log.
When I ran it with .py, I checked the log below.
We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.
It worked normally when I operated it in cpu.
Let's upgrade the gpu driver.