OrionZheng
/

openmoe-base

@@ -17,7 +17,7 @@ As a small student team, instead of pursuing the best model with better data, co
 [2024.01.12] The paper for the project and more evaluations are underway. For more information about the model, training, and evaluations, please visit our GitHub [repository](https://github.com/XueFuzhao/OpenMoE/tree/main).
-Note: The base model, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications. Better performence can be oberved from our 8B or 34B versions.
 ## Model Weights
 Currently, three models are released in total: OpenMoE-base, OpenMoE-8B(and its chat version), and OpenMoE-34B(intermediate checkpoint at 200B tokens).
@@ -36,7 +36,7 @@ We provide all these checkpoints on Huggingface(in pytorch) and Google Cloud Sto
 | **OpenMoE-34B/32E (200B)**   |  34B MoE with comparable FLOPs of a 7B LLaMA(No SFT)  |34B        |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |
-The base models, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architexture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications.
 The OpenMoE-8B with 4 MoE layers and 32 experts has been trained by 1.1T tokens. The SFT version has also been released after we finetuned the OpenMoE-8B-1.1T on the [wildchat]((https://huggingface.co/datasets/allenai/WildChat-nontoxic)) dataset's GPT-4 subset. Besides, we also provide some intermediate checkpoints at 200B and 890B tokens for research purposes.
@@ -99,13 +99,7 @@ Since the models are trained on The Redpajama and The Stack dataset, please chec
 This project is currently contributed by the following authors:
-- [Fuzhao Xue](https://xuefuzhao.github.io/)
-- [Zian Zheng](https://zheng-zian-andy.com)
-- [Yao Fu](https://franxyao.github.io/)
-- [Jinjie Ni](http://jinjie.one/)
-- [Zangwei Zheng](https://zhengzangw.github.io/)
-- [Wangchunshu Zhou](https://michaelzhouwang.github.io/)
-- [Yang You](https://www.comp.nus.edu.sg/~youy/)
 ## Acknowledgement
 The computational resources for this project were generously provided by the [Google TPU Research Cloud(TRC)](https://sites.research.google/trc/about/). We extend our heartfelt thanks to TRC for their invaluable support, which has been fundamental to the success of our work. Besides, we are extremely grateful to the [ColossalAI Team](https://github.com/hpcaitech/ColossalAI) for their tremendous support with the PyTorch implementation, especially [Xuanlei Zhao](https://oahzxl.github.io/) and [Wenhao Chen](https://github.com/CWHer), making training and inference of OpenMoE on GPUs a reality.

 [2024.01.12] The paper for the project and more evaluations are underway. For more information about the model, training, and evaluations, please visit our GitHub [repository](https://github.com/XueFuzhao/OpenMoE/tree/main).
+**Note:** The base model, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications. Better performence can be oberved from our 8B or 34B versions.
 ## Model Weights
 Currently, three models are released in total: OpenMoE-base, OpenMoE-8B(and its chat version), and OpenMoE-34B(intermediate checkpoint at 200B tokens).
 | **OpenMoE-34B/32E (200B)**   |  34B MoE with comparable FLOPs of a 7B LLaMA(No SFT)  |34B        |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |
+The base models, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications.
 The OpenMoE-8B with 4 MoE layers and 32 experts has been trained by 1.1T tokens. The SFT version has also been released after we finetuned the OpenMoE-8B-1.1T on the [wildchat]((https://huggingface.co/datasets/allenai/WildChat-nontoxic)) dataset's GPT-4 subset. Besides, we also provide some intermediate checkpoints at 200B and 890B tokens for research purposes.
 This project is currently contributed by the following authors:
+[Fuzhao Xue](https://xuefuzhao.github.io/), [Zian Zheng](https://zheng-zian-andy.com), [Yao Fu](https://franxyao.github.io/), [Jinjie Ni](http://jinjie.one/), [Zangwei Zheng](https://zhengzangw.github.io/), [Wangchunshu Zhou](https://michaelzhouwang.github.io/), [Yang You](https://www.comp.nus.edu.sg/~youy/)
 ## Acknowledgement
 The computational resources for this project were generously provided by the [Google TPU Research Cloud(TRC)](https://sites.research.google/trc/about/). We extend our heartfelt thanks to TRC for their invaluable support, which has been fundamental to the success of our work. Besides, we are extremely grateful to the [ColossalAI Team](https://github.com/hpcaitech/ColossalAI) for their tremendous support with the PyTorch implementation, especially [Xuanlei Zhao](https://oahzxl.github.io/) and [Wenhao Chen](https://github.com/CWHer), making training and inference of OpenMoE on GPUs a reality.