Model evaluation failed after 2 days

#622
by migtissera - opened
Open LLM Leaderboard org

Hi,
There was a problem when downloading one of the shards of your model (10), I passed it to pending and it should be reevaluated soon!

clefourrier changed discussion status to closed

Hey @clefourrier , I don't see it in the list. Not even in pending.

Open LLM Leaderboard org

Yep, it failed again 18h ago, again with a network problem (but at another step in the model download). Since we've had some issues yesterday on the network, I'm rescheduling it again.

Hey @clefourrier , seems like it failed again.

It failed again.

Unfortunately yes, seems like it failed. @clefourrier

Hey @clefourrier any update on this?

Open LLM Leaderboard org

Hi ! Your model failed to download again, I re submited it.

Open LLM Leaderboard org
edited Mar 11

Hi!
If this download fails yet another time, we'll have to consider the fact that there is a problem with your weights, as we usually don't have systematic problems to download models like this, and we relaunched your model 4 times (I had relaunched it after your Thursday message).

we'll have to consider the fact that there is a problem with your weights

There could also be an issue with transformers, just like in Cluj-Napoca's case.

Saturday there were ~50 models in pending and now the pending queue is cleared (I don't know if all failed or they actually finished successfully).

I know this because I'm waiting for my model to finish running for ~1 week. :)

I don't know how there's an issue with model weights (when downloading). There's many users who have downloaded this model. For example, this derivative model by Mihaii is using Tess-70B-v1.6: https://huggingface.co./Mihaiii/Covasna-0.1

Hi @SaylorTwift ,
The model has been running evals for 6 days now -- any way to check whether everything is good?

Open LLM Leaderboard org

Hi! If it's been displayed as RUNNING for longer than a day, the most likely explanation is that researchers have been launching training jobs (higher prio than evaluation jobs) and that your model got rescheduled.

Hmm, yeah, something somewhere is probably off.

My model upload - which is basically Tess 1.6 with same layers removed - got status Failed today. It's the first fail so I'm waiting to see what happens with Tess 1.6 before making a resubmit request.

Seems like this failed again @SaylorTwift @clefourrier

@migtissera wooohoo, it's on the leaderboard now!

Finally!

Sign up or log in to comment