Model Details

This is an official implementation of ODIN-ppo-L230-7B model, which is a chat assistant trained by fine-tuning LLaMA on Open-Assistant dataset via PPO. The L230 means the output length in LIMA test set is ~230. ODIN is the reward model for the training.

Model Description

Model Sources

Downloads last month
13
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Collection including Lichang-Chen/ODIN-ppo-L230-best