--- base_model: - meta-llama/Meta-Llama-3-8B-Instruct datasets: - princeton-nlp/llama3-ultrafeedback license: mit --- a simpo-like DPO method, trained on simpo data AlpacaEval:44.8(+2)