OrpoLlama-3.2-1B / README.md
bhuvana-ak7's picture
Updated model card
3076172 verified
metadata
library_name: transformers
tags: []

Model Card for Model ID

This model is a fine-tuned version of meta-llama/Llama-3.2-1B, using ORPO (Optimized Regularization for Prompt Optimization) Trainer. This model is fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset. Only 1000 data samples were used to train quickly using ORPO.

Model Details

Model Description

The base model meta-llama/Llama-3.2-1B has been fine-tuned using ORPO on a few samples of mlabonne/orpo-dpo-mix-40k dataset. The Llama 3.2 instruction-tuned text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. This fine-tuned version is aimed at improving the understanding of the context in prompts and thereby increasing the interpretability of the model.

  • Finetuned from model [meta-llama/Llama-3.2-1B]
  • Model Size: 1 Billion parameters
  • Fine-tuning Method: ORPO
  • Dataset: mlabonne/orpo-dpo-mix-40k

Evaluation

The model was evaluated on the following benchmarks, with the following performance metrics:

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.2504 ± 0.0043
none 0 acc_norm 0.2504 ± 0.0043