Update README.md
Browse files
README.md
CHANGED
@@ -17,13 +17,39 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
|
|
17 |
|
18 |
## Quick start
|
19 |
|
|
|
|
|
20 |
```python
|
21 |
from transformers import pipeline
|
22 |
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
```
|
28 |
|
29 |
## Training procedure
|
|
|
17 |
|
18 |
## Quick start
|
19 |
|
20 |
+
How to use the model:
|
21 |
+
|
22 |
```python
|
23 |
from transformers import pipeline
|
24 |
|
25 |
+
pipe = pipeline("token-classification", model="plaguss/Qwen2.5-Math-1.5B-Instruct-PRM-0.2", device="cuda")
|
26 |
+
|
27 |
+
example = {
|
28 |
+
"prompt": "Let $a,$ $b,$ and $c$ be positive real numbers. Find the set of all possible values of\n\\[\\frac{c}{a} + \\frac{a}{b + c} + \\frac{b}{c}.\\]",
|
29 |
+
"completions": [
|
30 |
+
"This problem involves finding the range of an expression involving three variables.",
|
31 |
+
"One possible strategy is to try to eliminate some variables and write the expression in terms of one variable only.",
|
32 |
+
"To do this, I might look for some common factors or symmetries in the expression.",
|
33 |
+
"I notice that the first and last terms have $c$ in the denominator, so I can factor out $c$ from the whole expression and get\n\\[\\frac{1}{c}\\left(c + \\frac{a^2}{b + c} + b\\right).\\]"
|
34 |
+
],
|
35 |
+
"labels": [True, True, True, False],
|
36 |
+
}
|
37 |
+
|
38 |
+
|
39 |
+
separator = "\n\n" # It's important to use the same separator as the one used during training
|
40 |
+
|
41 |
+
for idx in range(1, len(example["completions"]) + 1):
|
42 |
+
steps = example["completions"][0:idx]
|
43 |
+
text = separator.join((example["prompt"], *steps)) + separator # Add a separator between the prompt and each steps
|
44 |
+
pred_entity = pipe(text)[-1]["entity"]
|
45 |
+
pred = {"LABEL_0": False, "LABEL_1": True}[pred_entity]
|
46 |
+
label = example["labels"][idx - 1]
|
47 |
+
print(f"Step {idx}\tPredicted: {pred} \tLabel: {label}")
|
48 |
+
|
49 |
+
# Step 1 Predicted: True Label: True
|
50 |
+
# Step 2 Predicted: True Label: True
|
51 |
+
# Step 3 Predicted: True Label: True
|
52 |
+
# Step 4 Predicted: False Label: False
|
53 |
```
|
54 |
|
55 |
## Training procedure
|