Commit History

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
b585234

benediktstroebl commited on

fixed defaults for type error task summary
df5cda0

benediktstroebl commited on

added try catch loop to analze agent steps call
f98f521

benediktstroebl commited on

Upload usaco_USACO_Episodic_gpt-4o-mini-2024-07-18_1723429624.json
19f1cd0
unverified

benediktstroebl commited on

Delete evals_live/usaco_USACO_Episodic_gpt-4o-mini-2024-07-18_1723429624.json
4cf2b30
unverified

benediktstroebl commited on

Upload usaco_USACO_Semantic_gpt-4o-mini-2024-07-18_1723431631.json
7380536
unverified

benediktstroebl commited on

Delete evals_live/usaco_usaco_test_172306727812321123.json
d3e9bdb
unverified

benediktstroebl commited on

Delete evals_live/usaco_usaco_example_agent_1722871527.json
73428db
unverified

benediktstroebl commited on

Delete evals_live/usaco_usaco_example_agent_1722871.json
317b884
unverified

benediktstroebl commited on

Upload usaco_USACO_Episodic_gpt-4o-mini-2024-07-18_1723429624.json
3ee1461
unverified

benediktstroebl commited on

Upload usaco_USACO_Zero-shot_gpt-4o-mini-2024-07-18_1723417375.json
b0b576a
unverified

benediktstroebl commited on

fordev we are only monitoring 5 tasks
74258c3

benediktstroebl commited on

fixed step headline not showing
8946d7b

benediktstroebl commited on

format update and added monitor llm client backend
cd69490

benediktstroebl commited on

refactoring and USACO as default front page
221fb8a

benediktstroebl commited on

new data structure with global dict for faster processing
f9140ad

benediktstroebl commited on

big update with raw predictions section and dropdowns that dynamically parse agents of current leaderboard
ca89148

benediktstroebl commited on

added legend visibility dashboard
a30f956

benediktstroebl commited on

added initial version of visibility feature and fixed automatic update of results every hour
0b3117f

benediktstroebl commited on

fixed sorting. Modified axis labels
bf0e375

benediktstroebl commited on

added auto update ever 1 h of HF space
5f9c44d

benediktstroebl commited on

Merge branch 'main' of https://huggingface.co./spaces/agent-evals/leaderboard
b0d26e5

benediktstroebl commited on