HuggingFaceH4
/

Qwen2.5-Math-1.5B-Instruct-PRM-0.2

Token Classification

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

plaguss HF staff commited on 17 days ago

Commit

b0c5086

·

verified ·

1 Parent(s): 34c257e

Update README.md

Files changed (1) hide show

README.md +30 -4

README.md CHANGED Viewed

@@ -17,13 +17,39 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 ```python
 from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="plaguss/Qwen2.5-Math-1.5B-Instruct-PRM-0.2", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
 ## Training procedure

 ## Quick start
+How to use the model:
 ```python
 from transformers import pipeline
+pipe = pipeline("token-classification", model="plaguss/Qwen2.5-Math-1.5B-Instruct-PRM-0.2", device="cuda")
+example = {
+    "prompt": "Let $a,$ $b,$ and $c$ be positive real numbers.  Find the set of all possible values of\n\\[\\frac{c}{a} + \\frac{a}{b + c} + \\frac{b}{c}.\\]",
+    "completions": [
+        "This problem involves finding the range of an expression involving three variables.",
+        "One possible strategy is to try to eliminate some variables and write the expression in terms of one variable only.",
+        "To do this, I might look for some common factors or symmetries in the expression.",
+        "I notice that the first and last terms have $c$ in the denominator, so I can factor out $c$ from the whole expression and get\n\\[\\frac{1}{c}\\left(c + \\frac{a^2}{b + c} + b\\right).\\]"
+    ],
+    "labels": [True, True, True, False],
+}
+separator = "\n\n"  # It's important to use the same separator as the one used during training
+for idx in range(1, len(example["completions"]) + 1):
+    steps = example["completions"][0:idx]
+    text = separator.join((example["prompt"], *steps)) + separator  # Add a separator between the prompt and each steps
+    pred_entity = pipe(text)[-1]["entity"]
+    pred = {"LABEL_0": False, "LABEL_1": True}[pred_entity]
+    label = example["labels"][idx - 1]
+    print(f"Step {idx}\tPredicted: {pred} \tLabel: {label}")
+# Step 1  Predicted: True         Label: True
+# Step 2  Predicted: True         Label: True
+# Step 3  Predicted: True         Label: True
+# Step 4  Predicted: False        Label: False
 ```
 ## Training procedure