File size: 1,660 Bytes
e383898 f21c558 76fc47b e383898 76fc47b e383898 76fc47b e383898 76fc47b e383898 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
datasets:
- baidu/TARA
license: mit
language:
- en
library_name: transformers
---
<a href="https://iclr.cc/Conferences/2024" target="_blank">
<img alt="ICLR 2024" src="https://img.shields.io/badge/Proceedings-ICLR2024-red" />
</a>
Offical checkpoint for [Tool-Augmented Reward Modeling (ICLR 2024 spotlight)](https://openreview.net/pdf?id=d94x0gWTUX).
# Model Description
Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines.
It was introduced in the [ICLR 2024 paper](https://arxiv.org/pdf/2310.01045.pdf) and first released in this [repository](https://github.com/ernie-research/Tool-Augmented-Reward-Model).
Themis-7b is trained with [TARA](https://huggingface.co/datasets/baidu/TARA), achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.
## π₯ News
* **9 February, 2024:** π We release the official codebase and model weights of [`baidu/Themis-7b`](https://huggingface.co/baidu/Themis-7b). Stay tuned!π₯
* **16 January, 2024:** π Our work has been accepted to [ICLR 2024](https://iclr.cc/Conferences/2024) **spotlight**! β¨
# Citation
```text
@inproceedings{tarm-2024-ernie,
author = {Lei Li and
Yekun Chai and
Shuohuan Wang and
Yu Sun and
Hao Tian and
Ningyu Zhang and
Hua Wu},
title = {Tool-Augmented Reward Modeling},
booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
year = {2024},
url = {https://openreview.net/forum?id=d94x0gWTUX},
}
``` |