OpenEvals

community

Activity Feed

AI & ML interests

LLM evaluation

Recent Activity

clefourrier updated a collection about 6 hours ago

Leaderboards related tools

clefourrier updated a collection about 6 hours ago

Leaderboards related tools

clefourrier updated a collection about 6 hours ago

Leaderboards related tools

View all activity

Organization Card

Community About org cards

Hi! Welcome on the org page of the Evaluation team at HuggingFace. We want to support the community in building and sharing quality evaluations, for reproducible and fair model comparisions, to cut through the hype of releases and better understand actual model capabilities.

We're behind the:

Open LLM Leaderboard (over 11K models evaluated since 2023)
lighteval LLM evaluation suite, fast and filled with the SOTA benchmarks you might want
evaluation guidebook, your reference for LLM evals
leaderboards on the hub initiative, to encourage people to build more leaderboards in the open for more reproducible evaluation. You'll find some doc here to build your own!

OpenEvals

AI & ML interests

Recent Activity

Collections 4

Open-LLM performances are plateauing, let’s make the leaderboard steep again

Open LLM Leaderboard

open-llm-leaderboard/contents

open-llm-leaderboard/results

GAIA: a benchmark for General AI Assistants

Zephyr: Direct Distillation of LM Alignment

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

models

datasets 1

OpenEvals/find-a-leaderboard

AI & ML interests

Recent Activity

Team members 5

Collections 4

Open-LLM performances are plateauing, let’s make the leaderboard steep again

Open LLM Leaderboard

models

datasets 1