OrionZheng
/

openmoe-base

@@ -8,15 +8,16 @@ license: apache-2.0
 </p>
 <hr>
-# OpenMoE
 OpenMoE is a project aimed at igniting the open-source MoE community! We are releasing a family of open-sourced Mixture-of-Experts (MoE) Large Language Models.
 Our project began in the summer of 2023. On August 22, 2023, we released the first batch of intermediate checkpoints (OpenMoE-base&8B), along with the data and code [[Twitter]](https://twitter.com/xuefz/status/1693696988611739947?s=61&t=Xc2k2W7vU_hlpNizGDCmOw). Subsequently, the OpenMoE-8B training was completed in November, 2023. After that, we embarked on explorations on 34B scale model, which is still ongoing.
 As a small student team, instead of pursuing the best model with better data, computation, and human power, we devote to fully sharing our training data, strategies, model architecture, weights, and everything we have with the community. We hope this project will promote research on this promising field and invite more contributors to work on open-sourced MoE projects together!
-[2024.01.12] Currently, the paper for the project and more evaluations are underway. For more information about the model, training, and evaluations, please visit our GitHub [repository](https://github.com/XueFuzhao/OpenMoE/tree/main).
 ## Model Weights
 Currently, three models are released in total: OpenMoE-base, OpenMoE-8B(and its chat version), and OpenMoE-34B(intermediate checkpoint at 200B tokens).
@@ -26,7 +27,7 @@ We provide all these checkpoints on Huggingface(in pytorch) and Google Cloud Sto
 | Model Name     | Description                      | #Param   |Huggingface |
 |----------------|-------------------------------------------------|----------|-------------|
-| OpenMoE-base   | A small MoE model for debugging(only go through 128B tokens)         |637M      |[Link](https://huggingface.co/OrionZheng/openmoe-base) |
 | OpenLLaMA-base | A dense counter-part of OpenMoE-base            |310M      |[Link](https://huggingface.co/fuzhao/OpenLLaMA_Base) |
 | OpenMoE-8B-200B   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-200B/tree/main) |
 | OpenMoE-8B-890B   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT)  |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-890B) |

 </p>
 <hr>
+# OpenMoE-Base
 OpenMoE is a project aimed at igniting the open-source MoE community! We are releasing a family of open-sourced Mixture-of-Experts (MoE) Large Language Models.
 Our project began in the summer of 2023. On August 22, 2023, we released the first batch of intermediate checkpoints (OpenMoE-base&8B), along with the data and code [[Twitter]](https://twitter.com/xuefz/status/1693696988611739947?s=61&t=Xc2k2W7vU_hlpNizGDCmOw). Subsequently, the OpenMoE-8B training was completed in November, 2023. After that, we embarked on explorations on 34B scale model, which is still ongoing.
 As a small student team, instead of pursuing the best model with better data, computation, and human power, we devote to fully sharing our training data, strategies, model architecture, weights, and everything we have with the community. We hope this project will promote research on this promising field and invite more contributors to work on open-sourced MoE projects together!
+[2024.01.12] The paper for the project and more evaluations are underway. For more information about the model, training, and evaluations, please visit our GitHub [repository](https://github.com/XueFuzhao/OpenMoE/tree/main).
+Note: The base model, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications. Better performence can be oberved from our 8B or 34B versions.
 ## Model Weights
 Currently, three models are released in total: OpenMoE-base, OpenMoE-8B(and its chat version), and OpenMoE-34B(intermediate checkpoint at 200B tokens).
 | Model Name     | Description                      | #Param   |Huggingface |
 |----------------|-------------------------------------------------|----------|-------------|
+| OpenMoE-base   | A small MoE model for debugging only       |637M      |[Link](https://huggingface.co/OrionZheng/openmoe-base) |
 | OpenLLaMA-base | A dense counter-part of OpenMoE-base            |310M      |[Link](https://huggingface.co/fuzhao/OpenLLaMA_Base) |
 | OpenMoE-8B-200B   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-200B/tree/main) |
 | OpenMoE-8B-890B   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT)  |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-890B) |