Performance on MMLU Astronomy
#1
by
meni12345
- opened
Based on testing via LM Evaluation Harness it seems like this model is outperformed by the base version of Llama2 7B on MMLU Astronomy ("hendrycksTest-astronomy"). Is there a bug in the uploaded model?
hf-causal-experimental (pretrained=universeTBD/astrollama), limit: None, provide_description: False, num_fewshot: 0, batch_size: 4
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
hendrycksTest-astronomy | 1 | acc | 0.3816 | ± | 0.0395 |
acc_norm | 0.3816 | ± | 0.0395 |
hf-causal-experimental (pretrained=meta-llama/Llama-2-7b-hf), limit: None, provide_description: False, num_fewshot: 0, batch_size: 4
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
hendrycksTest-astronomy | 1 | acc | 0.4211 | ± | 0.0402 |
acc_norm | 0.4211 | ± | 0.0402 |
Hi @meni12345 , we haven't fine-tuned a chat version of the model, so no QA instruction was provided. We are currently in the process to do so and'll provide a chat version very soon. Thank you for testing our model!