hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 0, batch_size: 64
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_easy | 0 | acc | 0.4322 | ± | 0.0102 |
acc_norm | 0.3868 | ± | 0.0100 | ||
boolq | 1 | acc | 0.6092 | ± | 0.0085 |
lambada_openai | 0 | ppl | 74.2399 | ± | 2.9038 |
acc | 0.2604 | ± | 0.0061 | ||
openbookqa | 0 | acc | 0.1440 | ± | 0.0157 |
acc_norm | 0.2780 | ± | 0.0201 | ||
piqa | 0 | acc | 0.5909 | ± | 0.0115 |
acc_norm | 0.5871 | ± | 0.0115 | ||
winogrande | 0 | acc | 0.5225 | ± | 0.0140 |
hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 25, batch_size: 64
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 0.1817 | ± | 0.0113 |
acc_norm | 0.2329 | ± | 0.0124 |
hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 10, batch_size: 64
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
hellaswag | 0 | acc | 0.2792 | ± | 0.0045 |
acc_norm | 0.2865 | ± | 0.0045 |
hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 0, batch_size: 64
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
truthfulqa_mc | 1 | mc1 | 0.2485 | ± | 0.0151 |
mc2 | 0.4594 | ± | 0.0151 |
hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-101M-GQA,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 5, batch_size: 64
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
hendrycksTest-abstract_algebra | 1 | acc | 0.2200 | ± | 0.0416 |
acc_norm | 0.2200 | ± | 0.0416 | ||
hendrycksTest-anatomy | 1 | acc | 0.2741 | ± | 0.0385 |
acc_norm | 0.2741 | ± | 0.0385 | ||
hendrycksTest-astronomy | 1 | acc | 0.1776 | ± | 0.0311 |
acc_norm | 0.1776 | ± | 0.0311 | ||
hendrycksTest-business_ethics | 1 | acc | 0.2100 | ± | 0.0409 |
acc_norm | 0.2100 | ± | 0.0409 | ||
hendrycksTest-clinical_knowledge | 1 | acc | 0.2264 | ± | 0.0258 |
acc_norm | 0.2264 | ± | 0.0258 | ||
hendrycksTest-college_biology | 1 | acc | 0.2500 | ± | 0.0362 |
acc_norm | 0.2500 | ± | 0.0362 | ||
hendrycksTest-college_chemistry | 1 | acc | 0.1500 | ± | 0.0359 |
acc_norm | 0.1500 | ± | 0.0359 | ||
hendrycksTest-college_computer_science | 1 | acc | 0.1600 | ± | 0.0368 |
acc_norm | 0.1600 | ± | 0.0368 | ||
hendrycksTest-college_mathematics | 1 | acc | 0.3000 | ± | 0.0461 |
acc_norm | 0.3000 | ± | 0.0461 | ||
hendrycksTest-college_medicine | 1 | acc | 0.1908 | ± | 0.0300 |
acc_norm | 0.1908 | ± | 0.0300 | ||
hendrycksTest-college_physics | 1 | acc | 0.2157 | ± | 0.0409 |
acc_norm | 0.2157 | ± | 0.0409 | ||
hendrycksTest-computer_security | 1 | acc | 0.2200 | ± | 0.0416 |
acc_norm | 0.2200 | ± | 0.0416 | ||
hendrycksTest-conceptual_physics | 1 | acc | 0.2383 | ± | 0.0279 |
acc_norm | 0.2383 | ± | 0.0279 | ||
hendrycksTest-econometrics | 1 | acc | 0.2456 | ± | 0.0405 |
acc_norm | 0.2456 | ± | 0.0405 | ||
hendrycksTest-electrical_engineering | 1 | acc | 0.2276 | ± | 0.0349 |
acc_norm | 0.2276 | ± | 0.0349 | ||
hendrycksTest-elementary_mathematics | 1 | acc | 0.1772 | ± | 0.0197 |
acc_norm | 0.1772 | ± | 0.0197 | ||
hendrycksTest-formal_logic | 1 | acc | 0.2460 | ± | 0.0385 |
acc_norm | 0.2460 | ± | 0.0385 | ||
hendrycksTest-global_facts | 1 | acc | 0.2400 | ± | 0.0429 |
acc_norm | 0.2400 | ± | 0.0429 | ||
hendrycksTest-high_school_biology | 1 | acc | 0.3065 | ± | 0.0262 |
acc_norm | 0.3065 | ± | 0.0262 | ||
hendrycksTest-high_school_chemistry | 1 | acc | 0.2759 | ± | 0.0314 |
acc_norm | 0.2759 | ± | 0.0314 | ||
hendrycksTest-high_school_computer_science | 1 | acc | 0.1600 | ± | 0.0368 |
acc_norm | 0.1600 | ± | 0.0368 | ||
hendrycksTest-high_school_european_history | 1 | acc | 0.2242 | ± | 0.0326 |
acc_norm | 0.2242 | ± | 0.0326 | ||
hendrycksTest-high_school_geography | 1 | acc | 0.2828 | ± | 0.0321 |
acc_norm | 0.2828 | ± | 0.0321 | ||
hendrycksTest-high_school_government_and_politics | 1 | acc | 0.3472 | ± | 0.0344 |
acc_norm | 0.3472 | ± | 0.0344 | ||
hendrycksTest-high_school_macroeconomics | 1 | acc | 0.3026 | ± | 0.0233 |
acc_norm | 0.3026 | ± | 0.0233 | ||
hendrycksTest-high_school_mathematics | 1 | acc | 0.2667 | ± | 0.0270 |
acc_norm | 0.2667 | ± | 0.0270 | ||
hendrycksTest-high_school_microeconomics | 1 | acc | 0.2983 | ± | 0.0297 |
acc_norm | 0.2983 | ± | 0.0297 | ||
hendrycksTest-high_school_physics | 1 | acc | 0.1722 | ± | 0.0308 |
acc_norm | 0.1722 | ± | 0.0308 | ||
hendrycksTest-high_school_psychology | 1 | acc | 0.2312 | ± | 0.0181 |
acc_norm | 0.2312 | ± | 0.0181 | ||
hendrycksTest-high_school_statistics | 1 | acc | 0.4167 | ± | 0.0336 |
acc_norm | 0.4167 | ± | 0.0336 | ||
hendrycksTest-high_school_us_history | 1 | acc | 0.2451 | ± | 0.0302 |
acc_norm | 0.2451 | ± | 0.0302 | ||
hendrycksTest-high_school_world_history | 1 | acc | 0.2489 | ± | 0.0281 |
acc_norm | 0.2489 | ± | 0.0281 | ||
hendrycksTest-human_aging | 1 | acc | 0.2422 | ± | 0.0288 |
acc_norm | 0.2422 | ± | 0.0288 | ||
hendrycksTest-human_sexuality | 1 | acc | 0.2214 | ± | 0.0364 |
acc_norm | 0.2214 | ± | 0.0364 | ||
hendrycksTest-international_law | 1 | acc | 0.3223 | ± | 0.0427 |
acc_norm | 0.3223 | ± | 0.0427 | ||
hendrycksTest-jurisprudence | 1 | acc | 0.2500 | ± | 0.0419 |
acc_norm | 0.2500 | ± | 0.0419 | ||
hendrycksTest-logical_fallacies | 1 | acc | 0.2454 | ± | 0.0338 |
acc_norm | 0.2454 | ± | 0.0338 | ||
hendrycksTest-machine_learning | 1 | acc | 0.1964 | ± | 0.0377 |
acc_norm | 0.1964 | ± | 0.0377 | ||
hendrycksTest-management | 1 | acc | 0.2427 | ± | 0.0425 |
acc_norm | 0.2427 | ± | 0.0425 | ||
hendrycksTest-marketing | 1 | acc | 0.2009 | ± | 0.0262 |
acc_norm | 0.2009 | ± | 0.0262 | ||
hendrycksTest-medical_genetics | 1 | acc | 0.2400 | ± | 0.0429 |
acc_norm | 0.2400 | ± | 0.0429 | ||
hendrycksTest-miscellaneous | 1 | acc | 0.2593 | ± | 0.0157 |
acc_norm | 0.2593 | ± | 0.0157 | ||
hendrycksTest-moral_disputes | 1 | acc | 0.2486 | ± | 0.0233 |
acc_norm | 0.2486 | ± | 0.0233 | ||
hendrycksTest-moral_scenarios | 1 | acc | 0.2469 | ± | 0.0144 |
acc_norm | 0.2469 | ± | 0.0144 | ||
hendrycksTest-nutrition | 1 | acc | 0.2157 | ± | 0.0236 |
acc_norm | 0.2157 | ± | 0.0236 | ||
hendrycksTest-philosophy | 1 | acc | 0.2830 | ± | 0.0256 |
acc_norm | 0.2830 | ± | 0.0256 | ||
hendrycksTest-prehistory | 1 | acc | 0.2377 | ± | 0.0237 |
acc_norm | 0.2377 | ± | 0.0237 | ||
hendrycksTest-professional_accounting | 1 | acc | 0.2801 | ± | 0.0268 |
acc_norm | 0.2801 | ± | 0.0268 | ||
hendrycksTest-professional_law | 1 | acc | 0.2458 | ± | 0.0110 |
acc_norm | 0.2458 | ± | 0.0110 | ||
hendrycksTest-professional_medicine | 1 | acc | 0.2794 | ± | 0.0273 |
acc_norm | 0.2794 | ± | 0.0273 | ||
hendrycksTest-professional_psychology | 1 | acc | 0.2598 | ± | 0.0177 |
acc_norm | 0.2598 | ± | 0.0177 | ||
hendrycksTest-public_relations | 1 | acc | 0.2273 | ± | 0.0401 |
acc_norm | 0.2273 | ± | 0.0401 | ||
hendrycksTest-security_studies | 1 | acc | 0.3388 | ± | 0.0303 |
acc_norm | 0.3388 | ± | 0.0303 | ||
hendrycksTest-sociology | 1 | acc | 0.2189 | ± | 0.0292 |
acc_norm | 0.2189 | ± | 0.0292 | ||
hendrycksTest-us_foreign_policy | 1 | acc | 0.2100 | ± | 0.0409 |
acc_norm | 0.2100 | ± | 0.0409 | ||
hendrycksTest-virology | 1 | acc | 0.2169 | ± | 0.0321 |
acc_norm | 0.2169 | ± | 0.0321 | ||
hendrycksTest-world_religions | 1 | acc | 0.2047 | ± | 0.0309 |
acc_norm | 0.2047 | ± | 0.0309 |