bigband
/

FierceValkyries

Text Generation

Transformers

vllm

mistral

Model card Files Files and versions Community

bigband commited on Feb 4

Commit

24e589c

verified ·

1 Parent(s): 5273695

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +95 -0

README.md ADDED Viewed

	@@ -0,0 +1,95 @@

+---
+language:
+- en
+- fr
+- de
+- es
+- it
+- pt
+- zh
+- ja
+- ru
+- ko
+license: apache-2.0
+library_name: vllm
+inference: false
+extra_gated_description: >-
+  If you want to learn more about how we process your personal data, please read
+  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
+tags:
+- transformers
+---
+# Model Card for Mistral-Small-24B-Base-2501
+Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models!
+Check out our fine-tuned Instruct version [Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
+For enterprises that need specialized capabilities (increased context, particular modalities, domain specific knowledge, etc.), we will be releasing commercial models beyond what Mistral AI contributes to the community.
+This release demonstrates our commitment to open source, serving as a strong base model.
+Learn more about Mistral Small in our [blog post](https://mistral.ai/news/mistral-small-3/).
+Model developper: Mistral AI Team
+## Key Features
+- **Multilingual:** Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
+- **Advanced Reasoning:** State-of-the-art conversational and reasoning capabilities.
+- **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes.
+- **Context Window:** A 32k context window.
+- **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
+## Benchmark Results
+| Benchmark                      | Metric        | Mistral-Small-24B-Base |
+| ------------------------------ | ------------- | ----------- |
+| [MMLU][mmlu]                   | 5-shot        | 80.73        |
+| [MMLU Pro][mmlu_pro]         | 5-shot, CoT       | 54.37        |
+| [GPQA Main][gpqa]                   | 5-shot, CoT        | 34.37        |
+| [TriviaQA][triviaqa]         | 5-shot        | 80.32      |
+| [ARC-c][arc]                   | 0-shot       | 91.29        |
+| [TriviaQA][triviaqa]           | 5-shot        | 76.6        |
+| [MBPP][mbpp]                   | pass@1        | 69.64        |
+| [GSM8K][gsm8k]                 | 5-shot, maj@1 | 80.73        |
+| [MATH][math]                   | 4-shot, MaJ        | 45.98        |
+| [AGIEval][agieval]             | -      | 65.80        |
+| Benchmark                      | Metric        | Mistral-Small-24B-Base |
+| ------------------------------ | ------------- | ----------- |
+| French MMLU                    | -             | 78.03        |
+| German MMLU                    | -             | 77.69        |
+| Spanish MMLU                   | -             | 78.86        |
+| Russian MMLU                   | -             | 75.64        |
+| Chinese MMLU                   | -             | 70.35        |
+| Korean MMLU                    | -             | 56.42        |
+| Japanese MMLU                  | -             | 74.46        |
+[mmlu]: https://arxiv.org/abs/2009.03300
+[hellaswag]: https://arxiv.org/abs/1905.07830
+[piqa]: https://arxiv.org/abs/1911.11641
+[socialiqa]: https://arxiv.org/abs/1904.09728
+[boolq]: https://arxiv.org/abs/1905.10044
+[winogrande]: https://arxiv.org/abs/1907.10641
+[commonsenseqa]: https://arxiv.org/abs/1811.00937
+[openbookqa]: https://arxiv.org/abs/1809.02789
+[arc]: https://arxiv.org/abs/1911.01547
+[triviaqa]: https://arxiv.org/abs/1705.03551
+[naturalq]: https://github.com/google-research-datasets/natural-questions
+[humaneval]: https://arxiv.org/abs/2107.03374
+[mbpp]: https://arxiv.org/abs/2108.07732
+[gsm8k]: https://arxiv.org/abs/2110.14168
+[realtox]: https://arxiv.org/abs/2009.11462
+[bold]: https://arxiv.org/abs/2101.11718
+[crows]: https://aclanthology.org/2020.emnlp-main.154/
+[bbq]: https://arxiv.org/abs/2110.08193v2
+[winogender]: https://arxiv.org/abs/1804.09301
+[truthfulqa]: https://arxiv.org/abs/2109.07958
+[winobias]: https://arxiv.org/abs/1804.06876
+[math]: https://arxiv.org/abs/2103.03874
+[agieval]: https://arxiv.org/abs/2304.06364
+[big-bench]: https://arxiv.org/abs/2206.04615
+[toxigen]: https://arxiv.org/abs/2203.09509
+[mmlu_pro]: https://arxiv.org/abs/2406.01574
+[gpqa]: https://arxiv.org/abs/2311.12022