Update README.md

Model Name: Marsouuu/MistralBase-4x7B-MoE-ECE-PRYMMAL-Martial - Mixture of Experts (MoE)

Description:

This is a cutting-edge Mixture of Experts (MoE) model designed with 24-bit precision, tailored to excel in four key domains: mathematics, coding, storytelling, and general chat. Built with a dynamic mixture of expert layers, this model adapts to different tasks by routing inputs to the most relevant expert network, delivering high-quality outputs efficiently.

Key Features

• Mathematics Expert: Equipped with specialized mathematical reasoning capabilities, this model is fine-tuned for solving complex mathematical problems, numerical computations, and providing detailed explanations for mathematical concepts.
• Coding Expert: The model has been trained extensively on various programming languages and software development paradigms. It can help generate, debug, and explain code snippets, offering a comprehensive coding support experience.
• Storytelling Expert: Designed to assist in creative writing, this expert focuses on generating narratives, constructing dialogues, and offering story-building support for various genres.
• General Chat Expert: Capable of engaging in everyday conversations, offering accurate and contextually appropriate responses. This expert is versatile and adaptive to different conversational tones, whether it’s casual chit-chat or formal assistance.

Technical Specifications

• Model Architecture: Mixture of Experts (MoE) with a gating mechanism that routes inputs to the most relevant expert networks.
• Domains:
• Mathematics: Advanced reasoning and problem-solving.
• Coding: Programming support across multiple languages.
• Storytelling: Creative writing and narrative generation.
• General Chat: Versatile dialogue handling for various conversational contexts.
• Training Data: The model was trained on diverse datasets that cover each expert domain, ensuring robustness and versatility.
• Framework: Developed using [Nom du Framework, par exemple: PyTorch, TensorFlow], optimized for the MoE architecture with gated routing.

Usage

This model can be used for a wide range of applications:

• Educational Tools: Assisting with mathematical problems, coding exercises, and creative writing tasks.
• Software Development: Providing coding suggestions, code completion, and debugging support.
• Creative Writing: Generating stories, dialogues, and narrative content.
• Conversational Agents: Implementing chatbots with versatile conversational abilities.

Limitations

• The model may occasionally generate responses that are not entirely contextually appropriate, especially in cases requiring highly specialized domain knowledge.
• Despite its 24-bit precision, it may not perform well with extremely large datasets or tasks that require higher precision levels.

Evaluation and References

This model has been evaluated using a variety of benchmarks and tools to ensure a comprehensive assessment of its capabilities in mathematics, coding, storytelling, and general chat. The evaluation methods used are based on the following references:

@misc
{open-llm-leaderboard-v2,
author = {Clémentine Fourrier and Nathan Habib and Alina Lozovskaya and Konrad Szafer and Thomas Wolf},
title = {Open LLM Leaderboard v2},
year = {2024},
publisher = {Hugging Face},
howpublished = "\url{https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard}",
}

@software
{eval-harness,
author = {Gao, Leo and
Tow, Jonathan and
Biderman, Stella and
Black, Sid and
DiPofi, Anthony and
Foster, Charles and
Golding, Laurence and
Hsu, Jeffrey and
McDonell, Kyle and
Muennighoff, Niklas and
Phang, Jason and
Reynolds, Laria and
Tang, Eric and
Thite, Anish and
Wang, Ben and
Wang, Kevin and
Zou, Andy},
title = {A framework for few-shot language model evaluation},
month = sep,
year = 2021,
publisher = {Zenodo},
version = {v0.0.1},
doi = {10.5281/zenodo.5371628},
url = {https://doi.org/10.5281/zenodo.5371628},
}

@misc
{zhou2023instructionfollowingevaluationlargelanguage,
title={Instruction-Following Evaluation for Large Language Models},
author={Jeffrey Zhou and Tianjian Lu and Swaroop Mishra and Siddhartha Brahma and Sujoy Basu and Yi Luan and Denny Zhou and Le Hou},
year={2023},
eprint={2311.07911},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2311.07911},
}

@misc
{suzgun2022challengingbigbenchtaskschainofthought,
title={Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them},
author={Mirac Suzgun and Nathan Scales and Nathanael Schärli and Sebastian Gehrmann and Yi Tay and Hyung Won Chung and Aakanksha Chowdhery and Quoc V. Le and Ed H. Chi and Denny Zhou and Jason Wei},
year={2022},
eprint={2210.09261},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2210.09261},
}

@misc
{hendrycks2021measuringmathematicalproblemsolving,
title={Measuring Mathematical Problem Solving With the MATH Dataset},
author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},
year={2021},
eprint={2103.03874},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2103.03874},
}

@misc
{rein2023gpqagraduatelevelgoogleproofqa,
title={GPQA: A Graduate-Level Google-Proof Q&A Benchmark},
author={David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman},
year={2023},
eprint={2311.12022},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2311.12022},
}

@misc
{sprague2024musrtestinglimitschainofthought,
title={MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning},
author={Zayne Sprague and Xi Ye and Kaj Bostrom and Swarat Chaudhuri and Greg Durrett},
year={2024},
eprint={2310.16049},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2310.16049},
}

@misc
{wang2024mmluprorobustchallengingmultitask,
title={MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark},
author={Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen},
year={2024},
eprint={2406.01574},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.01574},
}

@misc
{open-llm-leaderboard-v1,
author = {Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf},
title = {Open LLM Leaderboard (2023-2024)},
year = {2023},
publisher = {Hugging Face},
howpublished = "\url{https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard}"
}

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -10,4 +10,4 @@ tags:
 - moe
 - mergekit
 - MoErges
----

 - moe
 - mergekit
 - MoErges
+---