metadata
library_name: transformers
base_model: meta-llama/Llama-3.1-70B-Instruct
datasets:
- infly/INF-ORM-Preference-Magnitude-80K
pipeline_tag: text-classification
INF Outcome Reward Model
Introduction
INF-ORM-Llama3.1-70B is the outcome reward model roughly built on the Llama-3.1-70B-Instruct architecture and trained with the dataset INF-ORM-Preference-Magnitude-80K.
Note: Train Details are coming soon!
RewardBench Leaderboard
We evaluate our model on RewardBench using the official test script locally. As of December 2024, INF-ORM-Llama3.1-70B ranks first on the RewardBench leaderboard.
Rank | Model | Model Type | Score | Chat | Chat Hard | Safety | Reasoning |
---|---|---|---|---|---|---|---|
1 | infly/INF-ORM-Llama3.1-70B | Custom Classifier | 95.2 | 96.9 | 91.0 | 93.8 | 99.1 |
2 | Skywork/Skywork-Reward-Gemma-2-27B-v0.2 | Seq. Classifier | 94.3 | 96.1 | 89.9 | 93.0 | 98.1 |
3 | nvidia/Llama-3.1-Nemotron-70B-Reward | Custom Classifier | 94.1 | 97.5 | 85.7 | 95.1 | 98.1 |
4 | Skywork/Skywork-Reward-Gemma-2-27B | Seq. Classifier | 93.8 | 95.8 | 91.4 | 91.9 | 96.1 |
5 | SF-Foundation/TextEval-Llama3.1-70B | Generative | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
6 | meta-metrics/MetaMetrics-RM-v1.0 | Custom Classifier | 93.4 | 98.3 | 86.4 | 90.8 | 98.2 |
7 | Skywork/Skywork-Critic-Llama-3.1-70B | Generative | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
8 | Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 | Seq. Classifier | 93.1 | 94.7 | 88.4 | 92.7 | 96.7 |
9 | nicolinho/QRM-Llama3.1-8B | Seq. Classifier | 93.1 | 94.4 | 89.7 | 92.3 | 95.8 |
10 | LxzGordon/URM-LLaMa-3.1-8B | Seq. Classifier | 92.9 | 95.5 | 88.2 | 91.1 | 97.0 |
Demo Code
We provide an example usage of the INF-ORM-Llama3.1-70B below. Below is an example of obtaining the reward scores of two conversations.
from typing import List, Optional, Union
import torch
import torch.nn as nn
from transformers import LlamaPreTrainedModel, LlamaModel, PreTrainedTokenizerFast
from transformers.modeling_outputs import SequenceClassifierOutputWithPast
class INFORMForSequenceClassification(LlamaPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.model = LlamaModel(config)
self.score = nn.Sequential(
nn.Linear(config.hidden_size, config.hidden_size),
nn.ReLU(),
nn.Linear(config.hidden_size, self.num_labels)
)
# Initialize weights and apply final processing
self.post_init()
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
labels: Optional[torch.LongTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
):
transformer_outputs = self.model(
input_ids,
attention_mask=attention_mask,
position_ids=position_ids,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
)
hidden_states = transformer_outputs[0]
logits = self.score(hidden_states)
if input_ids is not None:
batch_size = input_ids.shape[0]
else:
batch_size = inputs_embeds.shape[0]
if self.config.pad_token_id is None and batch_size != 1:
raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
if self.config.pad_token_id is None:
sequence_lengths = -1
else:
if input_ids is not None:
# if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
sequence_lengths = torch.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
sequence_lengths = sequence_lengths % input_ids.shape[-1]
sequence_lengths = sequence_lengths.to(logits.device)
else:
sequence_lengths = -1
pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
loss = None
return SequenceClassifierOutputWithPast(
loss=loss,
logits=pooled_logits,
past_key_values=transformer_outputs.past_key_values,
hidden_states=transformer_outputs.hidden_states,
attentions=transformer_outputs.attentions,
)
# Load model
model_name = "infly/INF-ORM-Llama3.1-70B"
orm = INFORMForSequenceClassification.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="flash_attention_2",
num_labels=1,
)
# Load tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name)
conv1 = [ { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa, then he bought 3 more oranges. Finally, he divided all the oranges equally among himself and his 4 friends. How many oranges does each person get?", "role": "user" }, { "content": "Tom has 20 oranges. He gave 5 oranges to Lisa. 20 - 5 = 15. Tom now has 15 oranges. Tom bought 3 more oranges. 15 + 3 = 18. Tom now has 18 oranges. Tom divides the 18 oranges equally among himself and his 4 friends (a total of 5 people). 18 ÷ 5 = 3.6 oranges. Each person gets 3.6 oranges.", "role": "assistant" } ]
conv2= [ { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa, then he bought 3 more oranges. Finally, he divided all the oranges equally among himself and his 4 friends. How many oranges does each person get?", "role": "user" }, { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa. 20 - 5 = 15. Tom now has 15 oranges. Tom bought 3 more oranges. 15 + 3 = 18. Tom now has 18 oranges. Tom divides the 18 oranges equally among his 4 friends (a total of 4 people). 18 ÷ 4 = 4.5 oranges. Each person gets 4.5 oranges.", "role": "assistant" } ]
conv1_tokenized = tokenizer.apply_chat_template(conv1, tokenize=True, return_tensors="pt").to("cuda")
conv2_tokenized = tokenizer.apply_chat_template(conv2, tokenize=True, return_tensors="pt").to("cuda")
# Inference
with torch.no_grad():
score1 = orm(conv1_tokenized).logits[0][0].item()
score2 = orm(conv2_tokenized).logits[0][0].item()
print(f"Score for response 1: {score1}")
print(f"Score for response 2: {score2}")
# Output:
# Score for response 1: 4.96875
# Score for response 2: 2.890625
Declaration and License Agreement
Declaration
License Agreement
Contact
If you have any questions, please feel free to reach us at [email protected].