Can't load model

by Samoed - opened 10 days ago

10 days ago

When I try to download the model, it gives the following error:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("zeta-alpha-ai/Zeta-Alpha-E5-Mistral")
model = AutoModel.from_pretrained("zeta-alpha-ai/Zeta-Alpha-E5-Mistral")

OSError: Could not locate config.json inside intfloat/e5-mistral-7b-instruct.

transformers==4.47.1
Example code

ArthurCamara

Zeta Alpha org 9 days ago

Hi there,
I'm looking into it now. I can load it just fine using sentence_transformers. I think the issue is that, for some reason, the loader is defaulting to try to get the PEFT adapters instead of the full model. I will take a look and get back to you in a bit.

ArthurCamara

Zeta Alpha org 9 days ago

@Samoed I think I've solved the problem. MTEB was having issues loading the PEFT using Sentence-Transformers. I'm testing the fix now (essentially moving the adapters to another folder) and will let you know when its live.
I've also found an issue where the non-PEFT model (i.e., the safetensor weights) are pointing to the non-ft model. So I will also fix that in a bit.

Samoed

9 days ago

Awesome! Thank you

ArthurCamara

Zeta Alpha org 9 days ago

I've tested the "fixed" version and the performance is a bit lower than expected. I think the issue is due to BF16 vs TF32, but I'm not sure. @Samoed do you want me to upload the not-so-fixed version ASAP? I will check what is happening and upload the correct version when I can fix it.

Samoed

8 days ago

No, I can wait!

ArthurCamara

Zeta Alpha org 4 days ago

•

edited 4 days ago

@Samoed Sorry for the delay. I was a bit sick on the last couple days. I've just pushed the fixed models here.
I've found some small (but not very significant) differences in the results when running the SentenceTransformers NanoBEIR evaluator compared to our original report, but I couldn't track down the exact reasons. Probably some small differences on the prompts that we used or on how pooling is implemented. Specifically, the average on nDCG@10 went from 0.686 up to 0.6900.
Another difference I found is when running the adapters-only model from E5-Mistral and using the merged model (i.e., directly loading from the repo). The merged model has a performance slightly higher, going from 0.6900 to 0.6924.
I will add these numbers to the readme later.

Samoed

3 days ago

Hope you're feeling better now! Thank you for your awesome work!

Samoed changed discussion status to closed 3 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment