Model Card for Model ID

This model is a fine-tuned version of meta-llama/Llama-3.2-1B, using ORPO (Optimized Regularization for Prompt Optimization) Trainer. This model is fine-tuned using the mlabonne/orpo-dpo-mix-40k dataset. Only 1000 data samples were used to train quickly using ORPO.

Model Details

Model Description

The base model meta-llama/Llama-3.2-1B has been fine-tuned using ORPO on a few samples of mlabonne/orpo-dpo-mix-40k dataset. The Llama 3.2 instruction-tuned text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. This fine-tuned version is aimed at improving the understanding of the context in prompts and thereby increasing the interpretability of the model.

Finetuned from model [meta-llama/Llama-3.2-1B]
Model Size: 1 Billion parameters
Fine-tuning Method: ORPO
Dataset: mlabonne/orpo-dpo-mix-40k

Evaluation

The model was evaluated on the following benchmarks, with the following performance metrics:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
hellaswag	1	none	0	acc	↑	0.2504	±	0.0043
		none	0	acc_norm	↑	0.2504	±	0.0043