Improve model card: Update pipeline tag, add library name, and update news section

#1
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +82 -29
README.md CHANGED
@@ -1,37 +1,88 @@
1
  ---
2
  license: apache-2.0
3
- pipeline_tag: image-text-to-text
 
4
  ---
5
 
6
  <h2 align="center"> <a href="https://arxiv.org/abs/2405.14297">Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models</a></h2>
7
  <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/LINs-lab/DynMoE">GitHub</a> and cite our paper!</h2>
8
  <h5 align="center">
9
 
10
- ## πŸ“° News
 
 
11
 
12
- - **[2024.05.31]** πŸ”₯ Our [code](https://github.com/LINs-lab/DynMoE/) is released!
13
- - **[2024.05.25]** πŸ”₯ Our **checkpoints** are available now!
14
- - **[2024.05.23]** πŸ”₯ Our [paper](https://arxiv.org/abs/2405.14297) is released!
 
15
 
16
- ## 😎 What's Interesting?
17
 
18
- **Dynamic Mixture of Experts (DynMoE)** incorporates (1) a novel gating method that enables each token to automatically determine the number of experts to activate. (2) An adaptive process automatically adjusts the number of experts during training.
19
 
20
- ### Top-Any Gating
21
 
22
- <video controls src="https://i.imgur.com/bLgNaoH.mp4" title="Top-Any Gating"></video>
23
 
24
- ### Adaptive Training Process
25
 
26
- ![](https://cdn.jsdelivr.net/gh/QAQdev/Pics@master/uPic/adaptive.png)
27
 
28
- ## πŸ’‘ Model Details
29
 
30
- - πŸ€” DynMoE-StableLM is a MoE model with **dynamic top-k gating**, finetuned on [LanguageBind/MoE-LLaVA-StableLM-Stage2](https://huggingface.co/LanguageBind/MoE-LLaVA-StableLM-Stage2).
31
- - πŸš€ Our DynMoE-StableLM-1.6B has totally 2.9B parameters, but **only 1.8B are activated!** (average top-k = 1.25)
32
- - βŒ› With the DynMoE tuning stage, we can complete training on 8 A100 GPUs **within 40 hours.**
33
 
34
- ## πŸ‘ Acknowledgement
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  We are grateful for the following awesome projects:
37
 
@@ -42,19 +93,21 @@ We are grateful for the following awesome projects:
42
  - [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)
43
  - [GLUE-X](https://github.com/YangLinyi/GLUE-X)
44
 
45
- ## πŸ”’ License
46
-
47
- This project is released under the Apache-2.0 license as found in the [LICENSE](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) file.
48
 
49
- ## ✏️ Citation
50
 
51
- ```tex
52
- @misc{guo2024dynamic,
53
- title={Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models},
54
- author={Yongxin Guo and Zhenglin Cheng and Xiaoying Tang and Tao Lin},
55
- year={2024},
56
- eprint={2405.14297},
57
- archivePrefix={arXiv},
58
- primaryClass={cs.LG}
59
  }
60
- ```
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
  ---
6
 
7
  <h2 align="center"> <a href="https://arxiv.org/abs/2405.14297">Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models</a></h2>
8
  <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/LINs-lab/DynMoE">GitHub</a> and cite our paper!</h2>
9
  <h5 align="center">
10
 
11
+ [![hf_space](https://img.shields.io/badge/πŸ€—-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2405.14297)
12
+ [![arxiv](https://img.shields.io/badge/Arxiv-2405.14297-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2405.14297)
13
+ [![visitor](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FLINs-lab%2FDynMoE&count_bg=%2379C83D&title_bg=%23454343&icon=&icon_color=%23E7E7E7&title=visitor&edge_flat=false)](https://hits.seeyoufarm.com)
14
 
15
+ ## News
16
+ - **[2025.01.23]**: πŸŽ‰ Our paper is accepted to ICLR 2025!
17
+ - **[2024.05.25]** Our [checkpoints](https://huggingface.co/collections/LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a) are available now!
18
+ - **[2024.05.23]** Our [paper](https://arxiv.org/abs/2405.14297) is released!
19
 
20
+ ## Why Do We Need DynMoE?
21
 
22
+ Sparse MoE (SMoE) has an unavoidable drawback: *the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.*
23
 
24
+ Also, *identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging.* As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.
25
 
26
+ Now, our **DynMoE** addresses these challenges through the two components introduced in [Dynamic Mixture of Experts (DynMoE)](#dynamic-mixture-of-experts-dynmoe).
27
 
28
+ ## Dynamic Mixture of Experts (DynMoE)
29
 
30
+ ## Top-Any Gating
31
 
32
+ ![hh](./assets/moe-overview.gif)
33
 
34
+ We first introduce a novel gating method that enables each token to **automatically determine the number of experts to activate**.
 
 
35
 
36
+ ## Adaptive Training Process
37
+
38
+ ![adaptive-training](https://cdn.jsdelivr.net/gh/QAQdev/Pics@master/uPic/adaptive.png)
39
+
40
+ Our method also includes an adaptive process **automatically adjusts the number of experts** during training.
41
+
42
+ ## Can We Trust DynMoE? Yes!
43
+
44
+ - On language tasks, **DynMoE surpasses the average performance among various MoE settings.**
45
+ - **Effectiveness of DynMoE remains consistent** in both Vision and Vision-Language tasks.
46
+ - Although sparsity is not enforced in DynMoE, it **maintains efficiency by activating even less parameters!**
47
+
48
+ ## Model Zoo
49
+
50
+ | Model | Activated Params / Total Params| Transformers(HF) |
51
+ | ----- | --------------- | ---------------- |
52
+ | DynMoE-StableLM-1.6B | 1.8B / 2.9B | [LINs-lab/DynMoE-StableLM-1.6B](https://huggingface.co/LINs-lab/DynMoE-StableLM-1.6B)
53
+ | DynMoE-Qwen-1.8B | 2.2B / 3.1B | [LINs-lab/DynMoE-Qwen-1.8B](https://huggingface.co/LINs-lab/DynMoE-Qwen-1.8B)
54
+ | DynMoE-Phi-2-2.7B | 3.4B / 5.3B| [LINs-lab/DynMoE-Phi-2-2.7B](https://huggingface.co/LINs-lab/DynMoE-Phi-2-2.7B)
55
+
56
+ ## Directory Specification
57
+
58
+ ### Experiment Code
59
+
60
+ - `EMoE/` contains experiments on language and vision tasks, which uses tutel-based DynMoE.
61
+ - `MoE-LLaVA/` contains experiments on language-vision tasks, which uses deepspeed-0.9.5-based DynMoE.
62
+
63
+ ### DynMoE Implementations
64
+
65
+ - `Deepspeed/` provides DynMoE-Deepspeed implementation. **(Recommend)**
66
+ - `EMoE/tutel/` provides DynMoE-Tutel implementation.
67
+
68
+ ## Environment Setup
69
+
70
+ Please refer to instructions under `EMoE/` and `MoE-LLaVA/`.
71
+
72
+ ## Usage
73
+
74
+ ### Tutel Examples
75
+
76
+ Please refer to `EMoE/Language/README.md` and `EMoE/Language/Vision.md`.
77
+
78
+ ### DeepSpeed Examples (Recommend)
79
+
80
+ We give a minimal example to train DynMoE-ViT on ImageNet-1K from scratch at `Examples/DeepSpeed-MoE`.
81
+
82
+ - Check `Examples/DeepSpeed-MoE/dynmoe_vit.py` for how to use DynMoE in model implementation.
83
+ - Check `Examples/DeepSpeed-MoE/train.py` for how to train model with DynMoE.
84
+
85
+ ## Acknowledgement
86
 
87
  We are grateful for the following awesome projects:
88
 
 
93
  - [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)
94
  - [GLUE-X](https://github.com/YangLinyi/GLUE-X)
95
 
96
+ ## Citation
 
 
97
 
98
+ If you find this project helpful, please consider citing our work:
99
 
100
+ ```bibtex
101
+ @article{guo2024dynamic,
102
+ title={Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models},
103
+ author={Guo, Yongxin and Cheng, Zhenglin and Tang, Xiaoying and Lin, Tao},
104
+ journal={arXiv preprint arXiv:2405.14297},
105
+ year={2024}
 
 
106
  }
107
+ ```
108
+
109
+ ## Star History
110
+
111
+ [![Star History Chart](https://api.star-history.com/svg?repos=LINs-lab/DynMoE&type=Date)](https://star-history.com/#LINs-lab/DynMoE&Date)
112
+
113
+ Code: https://github.com/LINs-lab/DynMoE