akumaburn
/

Open_Orca_Llama-3-8B-1K

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

akumaburn commited on Apr 24, 2024

Commit

bd2a640

·

verified ·

1 Parent(s): 9afe5a7

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -34,11 +34,18 @@ datasets:
 Some GGUF quantizations are included as well.
 Open_Orca_Llama-3-8B-unsloth.Q8_0.gguf:
 - **MMLU-Test:** Final result: 39.3818 +/- 0.4138
 Meta-Llama-3-8B.Q8_0.gguf:
-- **MMLU-Test:** Pending..
 Llama.cpp Options For Testing:
 --samplers "tfs;typical;temp" --draft 32 --ctx-size 8192 --temp 0.82 --tfs 0.8 --typical 1.1 --repeat-last-n 512 --batch-size 8192 --repeat-penalty 1.0 --n-gpu-layers 100 --threads 12

 Some GGUF quantizations are included as well.
+llama-3-8b-bnb-4bit.Q8_0.gguf:
+- **MMLU-Test:** Pending..
+- **Arc-Easy:** Pending..
 Open_Orca_Llama-3-8B-unsloth.Q8_0.gguf:
 - **MMLU-Test:** Final result: 39.3818 +/- 0.4138
+- **Arc-Easy:** Final result: 67.3684 +/- 1.9656
 Meta-Llama-3-8B.Q8_0.gguf:
+- **MMLU-Test:** Final result: 40.8664 +/- 0.4163
+- **Arc-Easy:** Final result: 74.3860 +/- 1.8299
 Llama.cpp Options For Testing:
 --samplers "tfs;typical;temp" --draft 32 --ctx-size 8192 --temp 0.82 --tfs 0.8 --typical 1.1 --repeat-last-n 512 --batch-size 8192 --repeat-penalty 1.0 --n-gpu-layers 100 --threads 12