File size: 1,497 Bytes
292ce81 494d8f9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
library_name: transformers
tags: []
---
This is a Mistral-7B Reward Model trained on [reciprocate/tinygsm_dpo](https://huggingface.co/datasets/reciprocate/tinygsm_dpo)
```python
from transformers import pipeline
reward_fn = pipeline(
"text-classification",
model="reciprocate/mistral-7b-gsm8k-code-rm",
truncation=True,
max_length=4096,
function_to_apply="none"
)
prompt = """\
Consider the following grade-school math problem: Megan has read 32 books this year. Kelcie has read 1/4 the amount of books that Megan has read. Greg has read 9 more than twice the number of books that Kelcie has read. How many books total have Megan, Kelcie, and Greg read?
Solve this problem using code.
- Give the complete solution to solve the problem written in Python.
- The program should contain multiple lines of code and end with 'result = XXX'.
- Use markdown to format your response starting with '```python' and ending with '```'.
"""
output = """\
Let's solve this problem using Python code.
```python
books_megan = 32
books_kelcie = books_megan / 4
books_kelcie = int(books_kelcie)
books_greg = 2 * books_kelcie + 9
total_books = books_megan + books_kelcie + books_greg
result = total_books```
"""
chats = [[
{"role": "user", "content": prompt},
{"role": "assistant", "content": output}
]]
inputs = [reward_fn.tokenizer.apply_chat_template(chat, tokenize=False) for chat in chats]
output = reward_fn(inputs)
scores = [x["score"] for x in output]
print(scores)
``` |