infly
/

INF-ORM-Llama3.1-70B

Text Classification

Transformers

Safetensors

llama

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Mghao commited on Dec 9, 2024

Commit

9f91572

1 Parent(s): 6ed4edd

Update README.md

Browse files

Files changed (1) hide show

README.md +109 -1

README.md CHANGED Viewed

@@ -33,6 +33,115 @@ We evaluate our model on [RewardBench](https://huggingface.co/spaces/allenai/rew
 ## Demo Code
 ## Declaration and License Agreement
 ### Declaration
@@ -40,7 +149,6 @@ We evaluate our model on [RewardBench](https://huggingface.co/spaces/allenai/rew
 ### License Agreement
 ## Contact
 If you have any questions, please feel free to reach us at <[email protected]>.
 ## Citation

 ## Demo Code
+We provide example usage of the Skywork reward model series below.
+Below is an example of obtaining the reward scores of two conversations.
+```python
+from typing import List, Optional, Union
+import torch
+import torch.nn as nn
+from transformers import LlamaPreTrainedModel, LlamaModel, PreTrainedTokenizerFast
+from transformers.modeling_outputs import SequenceClassifierOutputWithPast
+class INFORMForSequenceClassification(LlamaPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.model = LlamaModel(config)
+        self.score = nn.Sequential(
+            nn.Linear(config.hidden_size, config.hidden_size),
+            nn.ReLU(),
+            nn.Linear(config.hidden_size, self.num_labels)
+        )
+        # Initialize weights and apply final processing
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[List[torch.FloatTensor]] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ):
+        transformer_outputs = self.model(
+            input_ids,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            past_key_values=past_key_values,
+            inputs_embeds=inputs_embeds,
+        )
+        hidden_states = transformer_outputs[0]
+        logits = self.score(hidden_states)
+        if input_ids is not None:
+            batch_size = input_ids.shape[0]
+        else:
+            batch_size = inputs_embeds.shape[0]
+        if self.config.pad_token_id is None and batch_size != 1:
+            raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
+        if self.config.pad_token_id is None:
+            sequence_lengths = -1
+        else:
+            if input_ids is not None:
+                # if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
+                sequence_lengths = torch.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
+                sequence_lengths = sequence_lengths % input_ids.shape[-1]
+                sequence_lengths = sequence_lengths.to(logits.device)
+            else:
+                sequence_lengths = -1
+        pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
+        loss = None
+        return SequenceClassifierOutputWithPast(
+            loss=loss,
+            logits=pooled_logits,
+            past_key_values=transformer_outputs.past_key_values,
+            hidden_states=transformer_outputs.hidden_states,
+            attentions=transformer_outputs.attentions,
+        )
+# Load model
+model_name = "infly/INF-ORM-Llama3.1-70B"
+model_name = "/inspire/ssd/ws-ba572160-47f8-4ca1-984e-d6bcdeb95dbb/a100-maybe/public/yangminghao/upload_2_model/INF-ORM-Llama3.1-70B"
+orm = INFORMForSequenceClassification.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    attn_implementation="flash_attention_2",
+    num_labels=1,
+)
+# Load tokenizer
+tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name)
+conv1 = [ { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa, then he bought 3 more oranges. Finally, he divided all the oranges equally among himself and his 4 friends. How many oranges does each person get?", "role": "user" }, { "content": "Tom has 20 oranges. He gave 5 oranges to Lisa. 20 - 5 = 15. Tom now has 15 oranges. Tom bought 3 more oranges. 15 + 3 = 18. Tom now has 18 oranges. Tom divides the 18 oranges equally among himself and his 4 friends (a total of 5 people). 18 ÷ 5 = 3.6 oranges. Each person gets 3.6 oranges.", "role": "assistant" } ]
+conv2= [ { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa, then he bought 3 more oranges. Finally, he divided all the oranges equally among himself and his 4 friends. How many oranges does each person get?", "role": "user" }, { "content": "Tom has 20 oranges. He gave 5 oranges to his friend Lisa. 20 - 5 = 15. Tom now has 15 oranges. Tom bought 3 more oranges. 15 + 3 = 18. Tom now has 18 oranges. Tom divides the 18 oranges equally among his 4 friends (a total of 4 people). 18 ÷ 4 = 4.5 oranges. Each person gets 4.5 oranges.", "role": "assistant" } ]
+conv1_tokenized = tokenizer.apply_chat_template(conv1, tokenize=True, return_tensors="pt").to("cuda")
+conv2_tokenized = tokenizer.apply_chat_template(conv2, tokenize=True, return_tensors="pt").to("cuda")
+# Inference
+with torch.no_grad():
+    score1 = orm(conv1_tokenized).logits[0][0].item()
+    score2 = orm(conv2_tokenized).logits[0][0].item()
+print(f"Score for response 1: {score1}")
+print(f"Score for response 2: {score2}")
+# Output:
+# Score for response 1: 4.96875
+# Score for response 2: 2.890625
+```
 ## Declaration and License Agreement
 ### Declaration
 ### License Agreement
 ## Contact
 If you have any questions, please feel free to reach us at <[email protected]>.
 ## Citation