Unable to load finetuned model after saving

#57
by Charlington - opened

Hi,

I have been able to finetune this model and subsequently save the finetuned model with:
moondream.save_pretrained("checkpoints/moondream-ft")

But, when I later tried to load the finetuned model for evaluation with:
config = AutoConfig.from_pretrained("checkpoints/moondream-ft", trust_remote_code=True)
moondream = AutoModelForCausalLM.from_pretrained("checkpoints/moondream-ft", config=config, trust_remote_code=True, device_map={"": DEVICE})

I got the following error:
AttributeError: module 'transformers_modules.vikhyatk.moondream2.fb2293ab2450beb1dae536c056f5976becD58e4c.moondream' has no attribute 'Moondream'

I am not sure how to go about loading and using the finetuned model from here.
Any ideas?

For anyone else who encounters this same issue I have found a solution.

Based on this conversation
"https://stackoverflow.com/questions/79354534/how-to-load-a-finetuned-vision-llm-model-moondream-model-case"
I have written myself a guide for how to finetune Moondream2 and later load the finetuned model.

My guide is as follows:

Python Version 3.10
Cuda Version 12.4

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install pillow transformers bitsandbytes accelerate wandb einops pyvips

  1. Go into the saved model's config.json

  2. Under "auto_map" set the "AuotConfig" and "AutoModelForCausalLM" to the following:

    "AutoConfig": "configuration_moondream.MoondreamConfig",
    "AutoModelForCausalLM": "moondream.Moondream"

For the next step, go to the repo version on huggignface that matches the one used to finetune, or slightly earlier, then copy the necessary files:

  • "configuration_moondream.py"
  • "moondream.py"
  • "modeling_phi.py"
  • "vision_encoder.py"
    into the finetuned model's folder (mine is in "checkpoints\moondream-ft" so I put the files in the "moondream-ft" subfolder)

in this case, I finetuned my model with MD_REVISION = "2024-05-20" and I downloaded the above mentioned files from
https://huggingface.co./vikhyatk/moondream2/tree/48be9138e0faaec8802519b1b828350e33525d46

  1. Now download and place "configuration_moondream.py" into the model checkpoint's folder.
  2. Now download and place "moondream.py" into the model checkpoint's folder.
  3. Now download and place "modeling_phi.py" into the model checkpoint's folder.
  4. Now download and place "vision_encoder.py" into the model checkpoint's folder.
  5. You should now be able to load and run the model with "AutoModelForCausalLM.from_pretrained" from the transformers library.

@Charlington thanks for this.

BTW, how did you do the datacollation for the model? Did you find/develop scripts for that? Thanks

I think the data collation function was something I had help from ChatGPT for.

Here's the function I used that worked with my custom Dataset class.

def collate_fn(batch):
    images = [sample['image'] for sample in batch]
    images = [moondream.vision_encoder.preprocess(image) for image in images]

    labels_acc = []
    tokens_acc = []

    for sample in batch:
        toks = [tokenizer.bos_token_id]
        labs = [-100] * (IMG_TOKENS + 1)

        for qa in sample['qa']:
            q_t = tokenizer(
                f"\n\nQuestion: {qa['question']}\n\nAnswer:",
                add_special_tokens=False
            ).input_ids
            toks.extend(q_t)
            labs.extend([-100] * len(q_t))

            a_t = tokenizer(
                f" {qa['answer']}{ANSWER_EOS}",
                add_special_tokens=False
            ).input_ids
            toks.extend(a_t)
            labs.extend(a_t)

        tokens_acc.append(toks)
        labels_acc.append(labs)

    max_len = -1
    for labels in labels_acc:
        max_len = max(max_len, len(labels))

    attn_mask_acc = []

    for i in range(len(batch)):
        len_i = len(labels_acc[i])
        pad_i = max_len - len_i

        labels_acc[i].extend([-100] * pad_i)
        tokens_acc[i].extend([tokenizer.eos_token_id] * pad_i)
        attn_mask_acc.append([1] * len_i + [0] * pad_i)

    return (
        images,
        torch.stack([torch.tensor(t, dtype=torch.long) for t in tokens_acc]),
        torch.stack([torch.tensor(l, dtype=torch.long) for l in labels_acc]),
        torch.stack([torch.tensor(a, dtype=torch.bool) for a in attn_mask_acc]),
    )

Sign up or log in to comment