metadata
datasets:
- baidu/TARA
license: mit
language:
- en
library_name: transformers
Offical checkpoint for Tool-Augmented Reward Modeling (ICLR 2024 spotlight).
Model Description
Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines. It was introduced in the ICLR 2024 paper and first released in this repository. Themis-7b is trained with TARA, achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.
π₯ News
- 9 February, 2024: π We release the official codebase and model weights of
baidu/Themis-7b
. Stay tuned!π₯ - 16 January, 2024: π Our work has been accepted to ICLR 2024 spotlight! β¨
Citation
@inproceedings{tarm-2024-ernie,
author = {Lei Li and
Yekun Chai and
Shuohuan Wang and
Yu Sun and
Hao Tian and
Ningyu Zhang and
Hua Wu},
title = {Tool-Augmented Reward Modeling},
booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
year = {2024},
url = {https://openreview.net/forum?id=d94x0gWTUX},
}