Commit History

added failure report and two new swebench variants
5a7e21a

benediktstroebl commited on

update to avoid automatic processing
4822e7e

benediktstroebl commited on

added try catch loop to analze agent steps call
f98f521

benediktstroebl commited on

fordev we are only monitoring 5 tasks
74258c3

benediktstroebl commited on

format update and added monitor llm client backend
cd69490

benediktstroebl commited on

refactoring and USACO as default front page
221fb8a

benediktstroebl commited on

big update with raw predictions section and dropdowns that dynamically parse agents of current leaderboard
ca89148

benediktstroebl commited on