Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
How to evaluate hellaswag with LLM?
#66
by
ryusangwon
- opened
Hi,
I'm trying to evaluate hellaswag with LLM.
But, I have some question when we use Autoregressive model for this dataset.
hellaswag dataset is finding proper endings for each ctx.
I was thinking that, putting ctx as input to a Language model, and generate next sentence. Then, compare similarity with generated sentence and each endings, and find most similar ending.
But this process doesn't worked well. (very low accuracy)
Anyone has idea how to evaluate hellaswag dataset with Language model?
Hi!
In general, you can use the Eleuther AI Harness almost plug and play to evaluate any HF model on the hub, and I suggest you look at how they manage evaluation to better understand how this works 🤗
clefourrier
changed discussion status to
closed