Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

938

Is the gsm8k evaluated few-shot (no CoT)?

#365

by imone - opened Nov 10, 2023

Discussion

imone

Nov 10, 2023

Why is gsm8k lower than the results reported in the paper? For example, Llama 2 70b is 56.8 (reported) vs. 33.9 (leaderboard), as reported here. Is it evaluated using a few-shot (no CoT) setting, whereas it is typically run with a few-shot/zero-shot CoT?

imone

Nov 10, 2023

clefourrier

Open LLM Leaderboard org Nov 10, 2023

Hi! I'm closing this issue as it has already been discussed in the other one you pointed out, let's centralize discussions :)

clefourrier changed discussion status to closed Nov 10, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment