YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_700K.

Cosine Scheduler
Learning Rate: 9e-6
Warmup Ratio: 0.03
Batch Size: 256
Epoch: 1

Safetensors

Model size

7.5B params

Tensor type

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.