convert llava-v1.5-7b to liuhaotian/llava-v1.5-7b-hf format

#26
by deleted - opened
deleted

Thank you for your outstanding work. I recently fine-tuned the Llava model based on the liuhaotian/llava-v1.5-7b model. Now, I want to adapt the Llava model using the VLLM framework to improve inference speed. I found that VLLM uses files in the format of llava-v1.5-7b-hf. I want to know how to convert my fine-tuned Llava-v1.5-7b model to the llava-v1.5-7b-hf format. Because if I directly load the Llava-v1.5-7b model using VLLM, I will get an error saying "Model architectures ['LlavaLlamaForCausalLM'] are not supported for now". So I must do the conversion. I want to know how the llava-v1.5-7b-hf format is obtained.

Llava Hugging Face org

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

deleted

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.

Llava Hugging Face org

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.

Hello, have you succeeded? If so, can you briefly tell me what to do?Thank you for your reply.

deleted

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.

Hello, have you succeeded? If so, can you briefly tell me what to do?Thank you for your reply.

Following the instructions provided by nielsr's link is correct. The steps outlined there are very detailed.

deleted

Btw, I just uploaded a fine-tuning notebook for LLaVa with Transformers here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Fine_tune_LLaVa_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb

The operation link you provided is correct, thank you. Also, do you happen to know how the llava-next project is fine-tuned? Because the official documentation does not provide specific fine-tuning code(https://github.com/LLaVA-VL/LLaVA-NeXT/).

Llava Hugging Face org

LLaVa-NeXT is very similar to LLaVa and can be fine-tuned with the same script by adding a few changes.

I edited the provided notebook to adapt for LLaVa-NeXT: Colab Notebook

deleted

LLaVa-NeXT is very similar to LLaVa and can be fine-tuned with the same script by adding a few changes.

I edited the provided notebook to adapt for LLaVa-NeXT: Colab Notebook

Great, thank you for your work. However, in fact, I am more interested in the model fine-tuning process for llava-next-video. Do you have any suggestions? Or could you create a similar Jupyter notebook for fine-tuning?

Llava Hugging Face org

We haven't added LLaVa-NeXT-Video to transformers yet

From Video-LLMs there is Video-LLaVa, I am working on adding a fine-tune script for it. Will let you know here when it's ready

Llava Hugging Face org

@Dengxiaoyu, I added a tutorial on tuning Video-LLaVa in this Colab notebook

deleted

@Dengxiaoyu, I added a tutorial on tuning Video-LLaVa in this Colab notebook

Thank you for your enthusiastic help. If possible, I would also appreciate it if you could create a fine-tuning code for llava-next-video.

Llava Hugging Face org

It is not yet added to transformers. We are planning to work on adding and creating notebooks for Llava-Next-Video next month

Btw, I just uploaded a fine-tuning notebook for LLaVa with Transformers here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Fine_tune_LLaVa_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb

May I ask if there are any plans for transformers to support Llava-Next-Video?

Llava Hugging Face org

As per the last conversation with the authors, they want to release a better version before adding it in transformers. You can track the issue here

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

After conversion I found that the output logits are different. What might be the problem?

Llava Hugging Face org

It might be because of image preprocessing settings, make sure to double check whether you are forwarding the same exact pixel values and input id’s through the model.

The original implementation applies padding to the images which is not present in the Transformers library

Yes, just confirmed that this is true - people who also face this problem should check this out.

It might be because of image preprocessing settings, make sure to double check whether you are forwarding the same exact pixel values and input id’s through the model.

The original implementation applies padding to the images which is not present in the Transformers library

Llava Hugging Face org

Could you open an issue on the Transformers library? I had an implementation which 100% matches it, we could update the image processor.

can you provide unconverted Llama Part Weights which is used for qwen-interleave-0.5B for single image or multiple image
https://huggingface.co./llava-hf/llava-1.5-7b-hf/discussions/26#66436cdfbf8f506d97a36a41

Llava Hugging Face org

Done, please see https://github.com/huggingface/transformers/issues/33175.

@JackBAI What should I do to ensure the same image preprocessing setting as LLaVA with Transformers library? I see that they seem to have added a do_pad parameter to control how the image is processed, but I can't find the corresponding code in the main branch of the Transformers library.

Sign up or log in to comment