mukaj
/

deepseek-math-7b-rl-prm-v0.1

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

deepseek-math-7b-rl-prm-v0.1 / README.md

mukaj's picture

Initial Commit

c90b76f verified 8 months ago

|

history blame contribute delete

1.6 kB

	---
	license: other
	license_name: deepseek
	license_link: https://github.com/deepseek-ai/DeepSeek-Math/blob/main/LICENSE-MODEL
	---
	v0.1

	PRM Model adapted from: https://huggingface.co/deepseek-ai/deepseek-math-7b-rl

	This is a process reward model mostly trained on a flattened version of PRM800k using LORA and merged back to full model.

	### 1. How to Use


	```python
	prm_tokenizer = AutoTokenizer.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1")
	prm_tokenizer.pad_token = prm_tokenizer.eos_token

	prm_model = AutoModelForSequenceClassification.from_pretrained("mukaj/deepseek-math-7b-rl-prm-v0.1").eval()

	encoded_inputs = [prm_tokenizer.encode(candidate, return_tensors="pt") for candidate in batch_candidates]

	max_length = max([input_id.shape[1] for input_id in encoded_inputs]) # Find the longest sequence
	padded_inputs = [
	torch.nn.functional.pad(input_id, (0, max_length - input_id.size(1)), value=prm_tokenizer.pad_token_id) for
	input_id in encoded_inputs]
	input_ids = torch.cat(padded_inputs, dim=0).to("cuda")

	with torch.no_grad():
	outputs = prm_model(input_ids)

	logits = outputs.logits[0]

	scores = logits.softmax(dim=-1)

	log_probs = scores.log()

	```


	### 2. License
	This code repository is licensed under the MIT License. The use of DeepSeekMath models is subject to the Model License. DeepSeekMath supports commercial use.

	See the [LICENSE-MODEL](https://github.com/deepseek-ai/DeepSeek-Math/blob/main/LICENSE-MODEL) for more details.

	### 3. have any questions, please raise an issue or contact original team at [[email protected]](mailto:[email protected]).