core_leaderboard

Running

core_leaderboard / benchmark_submission.md

Upload 3 files

dabe474 verified 6 months ago

496 Bytes

	To submit a new benchmark to the library:

	1. Implement a new benchmark using some standard format (such as the [METR Task Standard](https://github.com/METR/task-standard)). This includes specifying the exact instructions for each tasks as well as the task environment that is provided inside the container the agent is run in.

	2. We will encourage developers to support running their tasks on separate VMs and specify the exact hardware specifications for each task in the task environment.