4x1.8B MoE Qwen Ckpt 50000
This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.
This model is a checkpoint model for the continue pretraining stage.
Evaluations
Groups | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|
boolq | 0 | acc | 0.6508 | ± | 0.0083 |
ceval-valid | 0 | acc | 0.5290 | ± | 0.1912 |
0 | acc_norm | 0.5290 | ± | 0.1912 | |
cmmlu | 0 | acc | 0.5087 | ± | 0.1237 |
0 | acc_norm | 0.5087 | ± | 0.1237 | |
mathqa | 0 | acc | 0.2647 | ± | 0.0081 |
0 | acc_norm | 0.2693 | ± | 0.0081 | |
mmlu | 0 | acc | 0.4353 | ± | 0.0830 |
- stem | 0 | acc | 0.3809 | ± | 0.0659 |
- social_sciences | 0 | acc | 0.4959 | ± | 0.0708 |
- other | 0 | acc | 0.4844 | ± | 0.0744 |
- humanities | 0 | acc | 0.3998 | ± | 0.0849 |
Acknowledgements
License Agreement
This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT].
During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.
- Downloads last month
- 476
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.