Commit History

Upload requirements.txt
b56511a
verified

benediktstroebl commited on

Upload preprocessed_traces.db
7db4465
verified

benediktstroebl commited on

Upload preprocessed_traces.db
bce89cb
verified

benediktstroebl commited on

modified heading and added about tab text
c50a008

benediktstroebl commited on

added one line descriptions to each benchmark with acknowledgements and modified headline
4e68e9f

benediktstroebl commited on

Upload preprocessed_traces.db
c4276af
verified

benediktstroebl commited on

Delete preprocessed_traces.db
040eed7
verified

benediktstroebl commited on

added verified agents management and column and fixed widths
b7d1f08

benediktstroebl commited on

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
9d2915b

benediktstroebl commited on

Upload preprocessed_traces.db
338177f
verified

benediktstroebl commited on

Upload preprocessed_traces.db
77c3be7
verified

benediktstroebl commited on

Upload swebench_verified_Agentless_gpt-4o-mini-2024-07-18_50_Instances_1723916965.json
01fb261
verified

benediktstroebl commited on

Delete evals_live/swebench_verified_Agentless_gpt-4o-2024-07-18_50_Instances_1723916965.json
e23eddc
verified

benediktstroebl commited on

Upload swebench_verified_Agentless_gpt-4o-2024-07-18_50_Instances_1723916965.json
a2d5cb2
verified

benediktstroebl commited on

added timestamp to task summary prompt for failure report and fixed failure report gradio issue
19bb306

benediktstroebl commited on

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
3427022

benediktstroebl commited on

added failure report and two new swebench variants
5a7e21a

benediktstroebl commited on

Added default to only restructure and not run llm task monitor inference calls
cb163b3

benediktstroebl commited on

Upload usaco_USACO_Reflexion__Episodic__Semantic_gpt-4o-mini-2024-07-18_1723558382.json
974935f
verified

benediktstroebl commited on

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
f9c6a2b

benediktstroebl commited on

update to avoid automatic processing
4822e7e

benediktstroebl commited on

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
b585234

benediktstroebl commited on