metadata
license: apache-2.0
language:
- en
Pre-trained model fine-tuned using Reinforcement Learning on DIALOCONAN dataset using facebook/roberta-hate-speech-dynabench-r4-target as reward model.
Toxicity results on allenai/real-toxicity-prompts dataset using custom prompts (see 🥞RewardLM for details).
Toxicity Level | RedPajama-INCITE-Chat-3B |
---|---|
Pre-Trained | 0.217 |
Fine-Tuned | 0.129 |
RL | 0.160 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 33.13 |
ARC (25-shot) | 38.65 |
HellaSwag (10-shot) | 63.53 |
MMLU (5-shot) | 25.16 |
TruthfulQA (0-shot) | 36.07 |
Winogrande (5-shot) | 60.14 |
GSM8K (5-shot) | 0.08 |
DROP (3-shot) | 8.24 |