Spaces:

Duplicated from benediktstroebl/hal

agent-evals
/

core_leaderboard

Running

App Files Files Community

core_leaderboard / utils

3 contributors

History: 11 commits

benediktstroebl's picture

benediktstroebl

added failure report and two new swebench variants

5a7e21a 7 months ago

data.py

9.47 kB

format update and added monitor llm client backend 7 months ago
pareto.py

1.34 kB

big update with raw predictions section and dropdowns that dynamically parse agents of current leaderboard 7 months ago
processing.py

6.27 kB

added failure report and two new swebench variants 7 months ago
viz.py

10.3 kB

added failure report and two new swebench variants 7 months ago