OrionZheng commited on
Commit
f69eb22
1 Parent(s): 3da30d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -9
README.md CHANGED
@@ -17,7 +17,7 @@ As a small student team, instead of pursuing the best model with better data, co
17
 
18
  [2024.01.12] The paper for the project and more evaluations are underway. For more information about the model, training, and evaluations, please visit our GitHub [repository](https://github.com/XueFuzhao/OpenMoE/tree/main).
19
 
20
- Note: The base model, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications. Better performence can be oberved from our 8B or 34B versions.
21
 
22
  ## Model Weights
23
  Currently, three models are released in total: OpenMoE-base, OpenMoE-8B(and its chat version), and OpenMoE-34B(intermediate checkpoint at 200B tokens).
@@ -36,7 +36,7 @@ We provide all these checkpoints on Huggingface(in pytorch) and Google Cloud Sto
36
  | **OpenMoE-34B/32E (200B)** | 34B MoE with comparable FLOPs of a 7B LLaMA(No SFT) |34B |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |
37
 
38
 
39
- The base models, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architexture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications.
40
 
41
  The OpenMoE-8B with 4 MoE layers and 32 experts has been trained by 1.1T tokens. The SFT version has also been released after we finetuned the OpenMoE-8B-1.1T on the [wildchat]((https://huggingface.co/datasets/allenai/WildChat-nontoxic)) dataset's GPT-4 subset. Besides, we also provide some intermediate checkpoints at 200B and 890B tokens for research purposes.
42
 
@@ -99,13 +99,7 @@ Since the models are trained on The Redpajama and The Stack dataset, please chec
99
 
100
  This project is currently contributed by the following authors:
101
 
102
- - [Fuzhao Xue](https://xuefuzhao.github.io/)
103
- - [Zian Zheng](https://zheng-zian-andy.com)
104
- - [Yao Fu](https://franxyao.github.io/)
105
- - [Jinjie Ni](http://jinjie.one/)
106
- - [Zangwei Zheng](https://zhengzangw.github.io/)
107
- - [Wangchunshu Zhou](https://michaelzhouwang.github.io/)
108
- - [Yang You](https://www.comp.nus.edu.sg/~youy/)
109
 
110
  ## Acknowledgement
111
  The computational resources for this project were generously provided by the [Google TPU Research Cloud(TRC)](https://sites.research.google/trc/about/). We extend our heartfelt thanks to TRC for their invaluable support, which has been fundamental to the success of our work. Besides, we are extremely grateful to the [ColossalAI Team](https://github.com/hpcaitech/ColossalAI) for their tremendous support with the PyTorch implementation, especially [Xuanlei Zhao](https://oahzxl.github.io/) and [Wenhao Chen](https://github.com/CWHer), making training and inference of OpenMoE on GPUs a reality.
 
17
 
18
  [2024.01.12] The paper for the project and more evaluations are underway. For more information about the model, training, and evaluations, please visit our GitHub [repository](https://github.com/XueFuzhao/OpenMoE/tree/main).
19
 
20
+ **Note:** The base model, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications. Better performence can be oberved from our 8B or 34B versions.
21
 
22
  ## Model Weights
23
  Currently, three models are released in total: OpenMoE-base, OpenMoE-8B(and its chat version), and OpenMoE-34B(intermediate checkpoint at 200B tokens).
 
36
  | **OpenMoE-34B/32E (200B)** | 34B MoE with comparable FLOPs of a 7B LLaMA(No SFT) |34B |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |
37
 
38
 
39
+ The base models, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications.
40
 
41
  The OpenMoE-8B with 4 MoE layers and 32 experts has been trained by 1.1T tokens. The SFT version has also been released after we finetuned the OpenMoE-8B-1.1T on the [wildchat]((https://huggingface.co/datasets/allenai/WildChat-nontoxic)) dataset's GPT-4 subset. Besides, we also provide some intermediate checkpoints at 200B and 890B tokens for research purposes.
42
 
 
99
 
100
  This project is currently contributed by the following authors:
101
 
102
+ [Fuzhao Xue](https://xuefuzhao.github.io/), [Zian Zheng](https://zheng-zian-andy.com), [Yao Fu](https://franxyao.github.io/), [Jinjie Ni](http://jinjie.one/), [Zangwei Zheng](https://zhengzangw.github.io/), [Wangchunshu Zhou](https://michaelzhouwang.github.io/), [Yang You](https://www.comp.nus.edu.sg/~youy/)
 
 
 
 
 
 
103
 
104
  ## Acknowledgement
105
  The computational resources for this project were generously provided by the [Google TPU Research Cloud(TRC)](https://sites.research.google/trc/about/). We extend our heartfelt thanks to TRC for their invaluable support, which has been fundamental to the success of our work. Besides, we are extremely grateful to the [ColossalAI Team](https://github.com/hpcaitech/ColossalAI) for their tremendous support with the PyTorch implementation, especially [Xuanlei Zhao](https://oahzxl.github.io/) and [Wenhao Chen](https://github.com/CWHer), making training and inference of OpenMoE on GPUs a reality.