Spaces:
Running
Running
To submit **a new benchmark** to the library: | |
1. Implement a new benchmark using some standard format (such as the [METR Task Standard](https://github.com/METR/task-standard)). This includes specifying the exact instructions for each tasks as well as the task environment that is provided inside the container the agent is run in. | |
2. We will encourage developers to support running their tasks on separate VMs and specify the exact hardware specifications for each task in the task environment. |