Spaces:
Running
Running
title: BenchBench Leaderboad | |
emoji: 🏋️♂️ | |
colorFrom: gray | |
colorTo: blue | |
sdk: streamlit | |
sdk_version: 1.36.0 | |
app_file: app.py | |
pinned: true | |
license: apache-2.0 | |
Check out the configuration reference at https://huggingface.co./docs/hub/spaces-config-reference | |
``` | |
@misc{perlitz2024benchmarkagreementtestingright, | |
title={Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation}, | |
author={Yotam Perlitz and Ariel Gera and Ofir Arviv and Asaf Yehudai and Elron Bandel and Eyal Shnarch and Michal Shmueli-Scheuer and Leshem Choshen}, | |
year={2024}, | |
eprint={2407.13696}, | |
archivePrefix={arXiv}, | |
primaryClass={cs.CL}, | |
url={https://arxiv.org/abs/2407.13696}, | |
} | |
``` |