No response as a result of model inference

#10
by whwnsgh - opened

I operated the sample code you provided.
However, the results do not seem to come out normally.

My Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")

prompt = "λ„ˆμ˜ μ†Œμ›μ„ 말해봐" # Korean example

messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)

print(tokenizer.decode(output[0]))

My Result

[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]Explain who you are
[|assistant|][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]더[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]더[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]

What should I check?

LG AI Research org
β€’
edited Aug 13
  1. Have you checked the version of transformers? EXAONE-3.0-7.8B-Instruct requires transformers v4.41 or later.

  2. The results doesn't match your code. The prompt "λ„ˆμ˜ μ†Œμ›μ„ 말해봐" is used in code but "Explain who you are" is printed in result

@yireun

Sorry Content sharing is invalid.

  1. Transformer Version
    (Whisper) B220256@AI-ML-02:~/LLM/Whisper$ python
    Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    import transformers
    transformers.version
    '4.43.3'

My transformers Library Version is 4.43.3.
(v4.41 or later.)

  1. The prompt "Tell me your wish" is correct I am sharing the correct code again.

My Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")

prompt = "λ„ˆμ˜ μ†Œμ›μ„ 말해봐" # Korean example

messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)

print(tokenizer.decode(output[0]))

My Result

[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]λ„ˆμ˜ μ†Œμ›μ„ 말해봐
[|assistant|][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]ETS[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]

Appendix

  1. If i use "explain who you are" at the prompt, it's the same result.
  2. if i use "hello" or "μ•ˆλ…•" , it's same result
LG AI Research org
  1. Would you try to add trust_remote_code=True in AutoTokenizer.from_pretrained() as follows?

    tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)
  2. Could you tell me the version of the followings:
    - torch
    - flash-attn
    - accelerate
    - cuda

  1. It's still the same symptom.

My Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)

prompt = "λ„ˆμ˜ μ†Œμ›μ„ 말해봐" # Korean example

messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)

print(tokenizer.decode(output[0]))

My Result

(Whisper) B220256@AI-ML-02:~/LLM/Whisper$ python lg.py
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [02:04<00:00, 17.76s/it]
We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.
[|system|]You are EXAONE model from LG AI Research, a helpful assistant.[|endofturn|]
[|user|]λ„ˆμ˜ μ†Œμ›μ„ 말해봐
[|assistant|

  1. This is my Lib Version
    import torch
    import accelerate
    import flash_attn

torch.version, accelerate.version, flash_attn.version

('2.3.1+cu121', '0.33.0', '2.6.3')

LG AI Research org

Because I don't have RTX 4000 series, I tested the code on AWS EC2 g6.2xlarge instance and the code worked normally.

  • AWS g6.2xlarge, which has a NVIDIA L4 GPU with Ada Lovelace Architecture
    • Ubuntu 22.04.4 LTS
    • Python 3.10.12
    • torch 2.3.1+cu121
    • transformers 4.43.3
    • accelerate 0.33.0
    • NO flash-attn because install error occurs

Thank you for your help.

I ran it on ipynb, but I couldn't check the detailed log.

When I ran it with .py, I checked the log below.

We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.

It worked normally when I operated it in cpu.

Let's upgrade the gpu driver.

Sign up or log in to comment