plaguss HF staff commited on
Commit
b0c5086
·
verified ·
1 Parent(s): 34c257e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -4
README.md CHANGED
@@ -17,13 +17,39 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
  ## Quick start
19
 
 
 
20
  ```python
21
  from transformers import pipeline
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="plaguss/Qwen2.5-Math-1.5B-Instruct-PRM-0.2", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ```
28
 
29
  ## Training procedure
 
17
 
18
  ## Quick start
19
 
20
+ How to use the model:
21
+
22
  ```python
23
  from transformers import pipeline
24
 
25
+ pipe = pipeline("token-classification", model="plaguss/Qwen2.5-Math-1.5B-Instruct-PRM-0.2", device="cuda")
26
+
27
+ example = {
28
+ "prompt": "Let $a,$ $b,$ and $c$ be positive real numbers. Find the set of all possible values of\n\\[\\frac{c}{a} + \\frac{a}{b + c} + \\frac{b}{c}.\\]",
29
+ "completions": [
30
+ "This problem involves finding the range of an expression involving three variables.",
31
+ "One possible strategy is to try to eliminate some variables and write the expression in terms of one variable only.",
32
+ "To do this, I might look for some common factors or symmetries in the expression.",
33
+ "I notice that the first and last terms have $c$ in the denominator, so I can factor out $c$ from the whole expression and get\n\\[\\frac{1}{c}\\left(c + \\frac{a^2}{b + c} + b\\right).\\]"
34
+ ],
35
+ "labels": [True, True, True, False],
36
+ }
37
+
38
+
39
+ separator = "\n\n" # It's important to use the same separator as the one used during training
40
+
41
+ for idx in range(1, len(example["completions"]) + 1):
42
+ steps = example["completions"][0:idx]
43
+ text = separator.join((example["prompt"], *steps)) + separator # Add a separator between the prompt and each steps
44
+ pred_entity = pipe(text)[-1]["entity"]
45
+ pred = {"LABEL_0": False, "LABEL_1": True}[pred_entity]
46
+ label = example["labels"][idx - 1]
47
+ print(f"Step {idx}\tPredicted: {pred} \tLabel: {label}")
48
+
49
+ # Step 1 Predicted: True Label: True
50
+ # Step 2 Predicted: True Label: True
51
+ # Step 3 Predicted: True Label: True
52
+ # Step 4 Predicted: False Label: False
53
  ```
54
 
55
  ## Training procedure