Text Generation
Transformers
PyTorch
Safetensors
English
opt
text-generation-inference
Inference Endpoints

Model Card for Model ID

This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat. It is based on OPT-350M.

Model Details

Model Description

  • Developed by: The Kaitchup
  • Model type: Reward model
  • Language(s) (NLP): English
  • License: cc-by-nc-sa-4.0
  • Finetuned from model: facebook/opt-350m

Model Sources

The model has been trained with the procedure described in this article:

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model

Downloads last month
26
Safetensors
Model size
331M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train kaitchup/OPT-350M-RM-DSChat