Commit History

Upload preprocessed_traces.db
77c3be7
verified

benediktstroebl commited on

Upload swebench_verified_Agentless_gpt-4o-mini-2024-07-18_50_Instances_1723916965.json
01fb261
verified

benediktstroebl commited on

Delete evals_live/swebench_verified_Agentless_gpt-4o-2024-07-18_50_Instances_1723916965.json
e23eddc
verified

benediktstroebl commited on

Upload swebench_verified_Agentless_gpt-4o-2024-07-18_50_Instances_1723916965.json
a2d5cb2
verified

benediktstroebl commited on

added timestamp to task summary prompt for failure report and fixed failure report gradio issue
19bb306

benediktstroebl commited on

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
3427022

benediktstroebl commited on

added failure report and two new swebench variants
5a7e21a

benediktstroebl commited on

Added default to only restructure and not run llm task monitor inference calls
cb163b3

benediktstroebl commited on

Upload usaco_USACO_Reflexion__Episodic__Semantic_gpt-4o-mini-2024-07-18_1723558382.json
974935f
verified

benediktstroebl commited on

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
f9c6a2b

benediktstroebl commited on

update to avoid automatic processing
4822e7e

benediktstroebl commited on

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
b585234

benediktstroebl commited on

fixed defaults for type error task summary
df5cda0

benediktstroebl commited on

added try catch loop to analze agent steps call
f98f521

benediktstroebl commited on

Upload usaco_USACO_Episodic_gpt-4o-mini-2024-07-18_1723429624.json
19f1cd0
unverified

benediktstroebl commited on

Delete evals_live/usaco_USACO_Episodic_gpt-4o-mini-2024-07-18_1723429624.json
4cf2b30
unverified

benediktstroebl commited on

Upload usaco_USACO_Semantic_gpt-4o-mini-2024-07-18_1723431631.json
7380536
unverified

benediktstroebl commited on

Delete evals_live/usaco_usaco_test_172306727812321123.json
d3e9bdb
unverified

benediktstroebl commited on

Delete evals_live/usaco_usaco_example_agent_1722871527.json
73428db
unverified

benediktstroebl commited on

Delete evals_live/usaco_usaco_example_agent_1722871.json
317b884
unverified

benediktstroebl commited on

Upload usaco_USACO_Episodic_gpt-4o-mini-2024-07-18_1723429624.json
3ee1461
unverified

benediktstroebl commited on

Upload usaco_USACO_Zero-shot_gpt-4o-mini-2024-07-18_1723417375.json
b0b576a
unverified

benediktstroebl commited on

fordev we are only monitoring 5 tasks
74258c3

benediktstroebl commited on

fixed step headline not showing
8946d7b

benediktstroebl commited on

format update and added monitor llm client backend
cd69490

benediktstroebl commited on

refactoring and USACO as default front page
221fb8a

benediktstroebl commited on