Llama-3.2-1B-Instruct-ORPO
Evaluation Environmental Inpact
Model Details
This model was obtained by finetuning the open source Llama-3.2-1B-Instruct model on the mlabonne/orpo-dpo-mix-40k dataset, leveraging Odds Ratio Preference Optimization (ORPO) for Reinforcement Learning.
Uses
This model is optimized for general-purpose language tasks.
Evaluation
We used the Eulether test harness to evaluate the finetuned model. The table below presents a summary of the evaluation performed.
For a more granular evaluation on MMLU
, please see Section MMLU.
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
hellaswag | 1 | none | 0 | acc | ↑ | 0.4507 | ± | 0.0050 |
none | 0 | acc_norm | ↑ | 0.6077 | ± | 0.0049 | ||
arc_easy | 1 | none | 0 | acc | ↑ | 0.6856 | ± | 0.0095 |
none | 0 | acc_norm | ↑ | 0.6368 | ± | 0.0099 | ||
mmlu | 2 | none | acc | ↑ | 0.4597 | ± | 0.0041 | |
- humanities | 2 | none | acc | ↑ | 0.4434 | ± | 0.0071 | |
- other | 2 | none | acc | ↑ | 0.5163 | ± | 0.0088 | |
- social sciences | 2 | none | acc | ↑ | 0.5057 | ± | 0.0088 | |
- stem | 2 | none | acc | ↑ | 0.3834 | ± | 0.0085 |
MMLU
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4597 | ± | 0.0041 | |
- humanities | 2 | none | acc | ↑ | 0.4434 | ± | 0.0071 | |
- formal_logic | 1 | none | 0 | acc | ↑ | 0.3254 | ± | 0.0419 |
- high_school_european_history | 1 | none | 0 | acc | ↑ | 0.6182 | ± | 0.0379 |
- high_school_us_history | 1 | none | 0 | acc | ↑ | 0.5784 | ± | 0.0347 |
- high_school_world_history | 1 | none | 0 | acc | ↑ | 0.6540 | ± | 0.0310 |
- international_law | 1 | none | 0 | acc | ↑ | 0.6033 | ± | 0.0447 |
- jurisprudence | 1 | none | 0 | acc | ↑ | 0.5370 | ± | 0.0482 |
- logical_fallacies | 1 | none | 0 | acc | ↑ | 0.4479 | ± | 0.0391 |
- moral_disputes | 1 | none | 0 | acc | ↑ | 0.4711 | ± | 0.0269 |
- moral_scenarios | 1 | none | 0 | acc | ↑ | 0.3408 | ± | 0.0159 |
- philosophy | 1 | none | 0 | acc | ↑ | 0.5177 | ± | 0.0284 |
- prehistory | 1 | none | 0 | acc | ↑ | 0.5278 | ± | 0.0278 |
- professional_law | 1 | none | 0 | acc | ↑ | 0.3683 | ± | 0.0123 |
- world_religions | 1 | none | 0 | acc | ↑ | 0.5906 | ± | 0.0377 |
- other | 2 | none | acc | ↑ | 0.5163 | ± | 0.0088 | |
- business_ethics | 1 | none | 0 | acc | ↑ | 0.4300 | ± | 0.0498 |
- clinical_knowledge | 1 | none | 0 | acc | ↑ | 0.4642 | ± | 0.0307 |
- college_medicine | 1 | none | 0 | acc | ↑ | 0.3815 | ± | 0.0370 |
- global_facts | 1 | none | 0 | acc | ↑ | 0.3200 | ± | 0.0469 |
- human_aging | 1 | none | 0 | acc | ↑ | 0.5157 | ± | 0.0335 |
- management | 1 | none | 0 | acc | ↑ | 0.5243 | ± | 0.0494 |
- marketing | 1 | none | 0 | acc | ↑ | 0.6709 | ± | 0.0308 |
- medical_genetics | 1 | none | 0 | acc | ↑ | 0.4800 | ± | 0.0502 |
- miscellaneous | 1 | none | 0 | acc | ↑ | 0.6015 | ± | 0.0175 |
- nutrition | 1 | none | 0 | acc | ↑ | 0.5686 | ± | 0.0284 |
- professional_accounting | 1 | none | 0 | acc | ↑ | 0.3511 | ± | 0.0285 |
- professional_medicine | 1 | none | 0 | acc | ↑ | 0.5625 | ± | 0.0301 |
- virology | 1 | none | 0 | acc | ↑ | 0.4157 | ± | 0.0384 |
- social sciences | 2 | none | acc | ↑ | 0.5057 | ± | 0.0088 | |
- econometrics | 1 | none | 0 | acc | ↑ | 0.2456 | ± | 0.0405 |
- high_school_geography | 1 | none | 0 | acc | ↑ | 0.5606 | ± | 0.0354 |
- high_school_government_and_politics | 1 | none | 0 | acc | ↑ | 0.5389 | ± | 0.0360 |
- high_school_macroeconomics | 1 | none | 0 | acc | ↑ | 0.4128 | ± | 0.0250 |
- high_school_microeconomics | 1 | none | 0 | acc | ↑ | 0.4454 | ± | 0.0323 |
- high_school_psychology | 1 | none | 0 | acc | ↑ | 0.6183 | ± | 0.0208 |
- human_sexuality | 1 | none | 0 | acc | ↑ | 0.5420 | ± | 0.0437 |
- professional_psychology | 1 | none | 0 | acc | ↑ | 0.4167 | ± | 0.0199 |
- public_relations | 1 | none | 0 | acc | ↑ | 0.5000 | ± | 0.0479 |
- security_studies | 1 | none | 0 | acc | ↑ | 0.5265 | ± | 0.0320 |
- sociology | 1 | none | 0 | acc | ↑ | 0.6468 | ± | 0.0338 |
- us_foreign_policy | 1 | none | 0 | acc | ↑ | 0.6900 | ± | 0.0465 |
- stem | 2 | none | acc | ↑ | 0.3834 | ± | 0.0085 | |
- abstract_algebra | 1 | none | 0 | acc | ↑ | 0.2500 | ± | 0.0435 |
- anatomy | 1 | none | 0 | acc | ↑ | 0.4889 | ± | 0.0432 |
- astronomy | 1 | none | 0 | acc | ↑ | 0.5329 | ± | 0.0406 |
- college_biology | 1 | none | 0 | acc | ↑ | 0.4931 | ± | 0.0418 |
- college_chemistry | 1 | none | 0 | acc | ↑ | 0.3800 | ± | 0.0488 |
- college_computer_science | 1 | none | 0 | acc | ↑ | 0.3300 | ± | 0.0473 |
- college_mathematics | 1 | none | 0 | acc | ↑ | 0.2800 | ± | 0.0451 |
- college_physics | 1 | none | 0 | acc | ↑ | 0.2451 | ± | 0.0428 |
- computer_security | 1 | none | 0 | acc | ↑ | 0.4800 | ± | 0.0502 |
- conceptual_physics | 1 | none | 0 | acc | ↑ | 0.4383 | ± | 0.0324 |
- electrical_engineering | 1 | none | 0 | acc | ↑ | 0.5310 | ± | 0.0416 |
- elementary_mathematics | 1 | none | 0 | acc | ↑ | 0.2884 | ± | 0.0233 |
- high_school_biology | 1 | none | 0 | acc | ↑ | 0.4935 | ± | 0.0284 |
- high_school_chemistry | 1 | none | 0 | acc | ↑ | 0.3645 | ± | 0.0339 |
- high_school_computer_science | 1 | none | 0 | acc | ↑ | 0.4500 | ± | 0.0500 |
- high_school_mathematics | 1 | none | 0 | acc | ↑ | 0.2815 | ± | 0.0274 |
- high_school_physics | 1 | none | 0 | acc | ↑ | 0.3113 | ± | 0.0378 |
- high_school_statistics | 1 | none | 0 | acc | ↑ | 0.3657 | ± | 0.0328 |
- machine_learning | 1 | none | 0 | acc | ↑ | 0.2768 | ± | 0.0425 |
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: MacBook Air M1
- Hours used: 1
- Cloud Provider: GPC, A100
- Compute Region: US-EAST1
- Carbon Emitted: 0.09 kgCO2 of which 100 percents were directly offset by the cloud provider.
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ramonactruta/ramonactruta-llama-3.2.Instruct
Base model
meta-llama/Llama-3.2-1B-InstructDataset used to train ramonactruta/ramonactruta-llama-3.2.Instruct
Evaluation results
- acc-norm (0-Shot) on mlabonne/orpo-dpo-mix-40kself-reported0.608