YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π RLHF Step-2 Reward Model
This repository is home to a RLHF reward model. This model is trained on questions and answers from the Stack Overflow Data Dump (https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences), using the distilroberta-base
model (https://huggingface.co/distilroberta-base) as a base.
Usage
You can use this model directly with a pipeline for tasks such as text generation and instruction following:
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
pipeline
)
reward_model = AutoModelForSequenceClassification.from_pretrained(
cambioml/rlhf_reward_model,
num_labels=1,
# torch_dtype=torch.bfloat16,
load_in_8bit=True,
device_map={"": Accelerator().process_index}
)
reward_tokenizer = AutoTokenizer.from_pretrained(cambioml/rlhf_reward_model)
reward_tokenizer.pad_token = reward_tokenizer.eos_token
reward_kwargs = {
"return_all_scores": True,
"function_to_apply": "none",
"batch_size": 32,
"truncation": True,
"max_length": 138
}
reward_pipe = pipeline(
"sentiment-analysis",
model=reward_model,
model_kwargs=reward_kwargs,
tokenizer=reward_tokenizer,
return_token_type_ids=False,
)
- Downloads last month
- 132
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.