added one line descriptions to each benchmark with acknowledgements and modified headline 4e68e9f benediktstroebl commited on Aug 23, 2024
added verified agents management and column and fixed widths b7d1f08 benediktstroebl commited on Aug 20, 2024
Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard 9d2915b benediktstroebl commited on Aug 20, 2024
Upload swebench_verified_Agentless_gpt-4o-mini-2024-07-18_50_Instances_1723916965.json 01fb261 verified benediktstroebl commited on Aug 18, 2024
Delete evals_live/swebench_verified_Agentless_gpt-4o-2024-07-18_50_Instances_1723916965.json e23eddc verified benediktstroebl commited on Aug 18, 2024
Upload swebench_verified_Agentless_gpt-4o-2024-07-18_50_Instances_1723916965.json a2d5cb2 verified benediktstroebl commited on Aug 17, 2024
added timestamp to task summary prompt for failure report and fixed failure report gradio issue 19bb306 benediktstroebl commited on Aug 17, 2024
Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard 3427022 benediktstroebl commited on Aug 17, 2024
Added default to only restructure and not run llm task monitor inference calls cb163b3 benediktstroebl commited on Aug 17, 2024
Upload usaco_USACO_Reflexion__Episodic__Semantic_gpt-4o-mini-2024-07-18_1723558382.json 974935f verified benediktstroebl commited on Aug 13, 2024
Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard f9c6a2b benediktstroebl commited on Aug 13, 2024
Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard b585234 benediktstroebl commited on Aug 12, 2024