|
--- |
|
license: other |
|
license_name: deepseek |
|
license_link: https://github.com/deepseek-ai/DeepSeek-Math/blob/main/LICENSE-MODEL |
|
--- |
|
v0.1 |
|
|
|
PRM Model adapted from: https://huggingface.co/deepseek-ai/deepseek-math-7b-rl |
|
|
|
This is a process reward model mostly trained on a flattened version of PRM800k using LORA and merged back to full model. |
|
|
|
### 1. How to Use |
|
|
|
|
|
```python |
|
prm_tokenizer = AutoTokenizer.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1") |
|
prm_tokenizer.pad_token = prm_tokenizer.eos_token |
|
|
|
prm_model = AutoModelForSequenceClassification.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1").eval() |
|
|
|
encoded_inputs = [prm_tokenizer.encode(candidate, return_tensors="pt") for candidate in batch_candidates] |
|
|
|
max_length = max([input_id.shape[1] for input_id in encoded_inputs]) # Find the longest sequence |
|
padded_inputs = [ |
|
torch.nn.functional.pad(input_id, (0, max_length - input_id.size(1)), value=prm_tokenizer.pad_token_id) for |
|
input_id in encoded_inputs] |
|
input_ids = torch.cat(padded_inputs, dim=0).to("cuda") |
|
|
|
with torch.no_grad(): |
|
outputs = prm_model(input_ids) |
|
|
|
logits = outputs.logits[0] |
|
|
|
scores = logits.softmax(dim=-1) |
|
|
|
log_probs = scores.log() |
|
|
|
``` |
|
|
|
|
|
### 2. License |
|
This code repository is licensed under the MIT License. The use of DeepSeekMath models is subject to the Model License. DeepSeekMath supports commercial use. |
|
|
|
See the [LICENSE-MODEL](https://github.com/deepseek-ai/DeepSeek-Math/blob/main/LICENSE-MODEL) for more details. |
|
|
|
### 3. have any questions, please raise an issue or contact original team at [[email protected]](mailto:[email protected]). |
|
|
|
|