How to use this weight file with pure `transformer` code?

by Seungyoun - opened Apr 23

Discussion

Seungyoun

Apr 23

•

edited Apr 24

I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalin

https://huggingface.co./Seungyoun/llava-llama-3-8b-hf

It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla

LZHgrla

xtuner org Apr 23

Hi @Seungyoun
This model follows the format of official llava-v1.5/v1.6. and is not in the format of LlavaForConditionalGeneration.

We will provide a conversion script in about 1-2 day's, to convert this model to LlavaForConditionalGeneration model.
Before that, please use the cli or lmdeploy, as https://huggingface.co./xtuner/llava-llama-3-8b-v1_1-hf#quickstart

Seungyoun

Apr 23

Thank you for your prompt response. @LZHgrla

This model follows the format of official llava-v1.5/v1.6. and is not in the format of LlavaForConditionalGeneration.

We will provide a conversion script in about 1-2 day's, to convert this model to LlavaForConditionalGeneration model.

I need to clarify something: in your message, you mention that this model follows the format of the official LLaVA-v1.5/v1.6 and is not directly in the LlavaForConditionalGeneration format. However, my understanding is that LLaVA-v1.5/v1.6 corresponds to what is known as LlavaNextForConditionalGeneration. Could you confirm if this is the case?

Seungyoun changed discussion title from How to use this weight file with pure transformer code? to How to use this weight file with pure `transformer` code? Apr 23

LZHgrla

xtuner org Apr 23

Good question! There are so many formats for llama model.
Here are two examples :
https://huggingface.co./liuhaotian/llava-v1.5-7b/tree/main is llava format
https://huggingface.co./llava-hf/llava-1.5-7b-hf/tree/main is hf format

This model is in llava format, although it has a -hf suffix.

Seungyoun

Apr 23

•

edited Apr 23

@LZHgrla
I am trying to manually fixing the model weight index mapping to proper structual and config.json

Is your model also followes this added_tokens.json ?

{
  "<image>": 32000,
  "<pad>": 32001
}

LZHgrla

xtuner org Apr 23

•

edited Apr 23

@Seungyoun

The original vocab size is 128256

So, I think the correct token ids should be

{
  "<image>": 128257,
  "<pad>": 128258
}

csegalin

Apr 23

were you able to make it work?
using the cli version I get issue with the transformer 4.37 required by llava while all these new models work with at least 4.39

csegalin

Apr 25

I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalin

https://huggingface.co./Seungyoun/llava-llama-3-8b-hf

It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla

I added an issue in your repo.

@LZHgrla any update on the conversion script?

LZHgrla

xtuner org Apr 25

@csegalin
We will release pure transformers and gguf version models and the corresponding conversion scripts within a few days.

Before that, welcome to try our newly released llava-phi-3-mini model, which has multiple format supports, including the official llava format, pure transformers format and gguf format.
https://huggingface.co./xtuner/llava-phi-3-mini-hf

LZHgrla

xtuner org Apr 25

•

edited Apr 25

I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalin

https://huggingface.co./Seungyoun/llava-llama-3-8b-hf

It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla

Thanks! We use the llama-3's chat-template to train llava-llama-3-8b models.

That is,

"chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",

An image-text example is

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<image>What is it?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

AAAAAAAAA<|eot_id|>
<|start_header_id|>user<|end_header_id|>

Do you like it?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

LZHgrla

xtuner org Apr 28

•

edited Apr 28

@csegalin @Seungyoun Hi!
Here are the pure transformers model and gguf model!

https://huggingface.co./xtuner/llava-llama-3-8b-v1_1-transformers
https://huggingface.co./xtuner/llava-llama-3-8b-v1_1-gguf

csegalin

Apr 29

Hei thanks
I tried yesterday and not sure why but performance is worse than the first version. A lot of repeated words, less accurate even if using same generation parameter

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment