4x1.8B MoE Qwen Ckpt 50000

This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.

This model is a checkpoint model for the continue pretraining stage.

Evaluations

Groups n-shot Metric Value Stderr
boolq 0 acc 0.6508 ± 0.0083
ceval-valid 0 acc 0.5290 ± 0.1912
0 acc_norm 0.5290 ± 0.1912
cmmlu 0 acc 0.5087 ± 0.1237
0 acc_norm 0.5087 ± 0.1237
mathqa 0 acc 0.2647 ± 0.0081
0 acc_norm 0.2693 ± 0.0081
mmlu 0 acc 0.4353 ± 0.0830
- stem 0 acc 0.3809 ± 0.0659
- social_sciences 0 acc 0.4959 ± 0.0708
- other 0 acc 0.4844 ± 0.0744
- humanities 0 acc 0.3998 ± 0.0849

Acknowledgements

License Agreement

This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT].

During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.

Downloads last month
476
Safetensors
Model size
4.27B params
Tensor type
F32
·
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.