upload README
Browse files
README.md
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
|
2 |
+
|
3 |
+
## News
|
4 |
+
- 1/10/2024 - Camelidae models are now available on [🤗HuggingFace](https://huggingface.co/hywu).
|
5 |
+
- 1/4/2024 - We released the paper, [Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731).
|
6 |
+
- 12/22/2023 - We released the training [repo](https://github.com/wuhy68/Parameter-Efficient-MoE) that craft the dense model with LLaMA architecture to the MoE model.
|
7 |
+
|
8 |
+
## Introduction
|
9 |
+
Camelidae models are trained utilizing Parameter-Efficient Sparsity Crafting techniques
|
10 |
+
|
11 |
+
Parameter-Efficient Sparsity Crafting can help dense models learn knowledge from different fields (including code and math). This appraoch perfrom instruction tuning and utilize MoE structure in an efficient way.
|
12 |
+
|
13 |
+
Specifically, Parameter-Efficient Sparsity Crafting utilizes parameter efficient techiniques including [QLoRA](https://arxiv.org/abs/2305.14314) and [Adapter](https://arxiv.org/abs/1902.00751) to perfrom Efficient [Sparse Upcycling](https://arxiv.org/abs/2212.05055).
|
14 |
+
|
15 |
+
## Model Lists
|
16 |
+
| Model | Download
|
17 |
+
|---|---
|
18 |
+
Camelidae-8x7B | [🤗HuggingFace](https://huggingface.co/hywu/Camelidae-8x7B)
|
19 |
+
Camelidae-8x13B | [🤗HuggingFace](https://huggingface.co/hywu/Camelidae-8x13B)
|
20 |
+
Camelidae-8x34B | [🤗HuggingFace](https://huggingface.co/hywu/Camelidae-8x34B)
|
21 |
+
|
22 |
+
## Performance
|
23 |
+
| Model | MMLU (5shot) | GSM8k (5shot) | MATH (4shot) | HumanEval (0shot) | MBPP (4shot) | HellaSwag (10shot) | TriviaQA (0shot) |
|
24 |
+
|----------------------:|:------------:|:-------------:|:------------:|:-----------------:|:------------:|:------------------:|:----------------:|
|
25 |
+
| GPT3.5 | 70.0% | 57.1% | **34.1%** | **48.1%** | - | 85.5% | - |
|
26 |
+
| Camelidae-8x34B | 75.6% | **78.3%** | **22.6%** | **43.9%** | **41.4%** | 85.3% | **63.4%** |
|
27 |
+
| SUSChat-34B | **76.4%** | 72.3% | 22.0% | 11.6% | 40.2% | 83.9% | 56.1% |
|
28 |
+
| Mixtral-8x7B-instruct | 68.7% | 71.7% | 22.1% | 25.6% | 40.6% | **86.5%** | 57.7% |
|
29 |
+
| LLaMA2-70B-chat | 63.8% | 59.3% | 10.4% | 32.3% | 35.6% | 84.8% | 63.0% |
|
30 |
+
| Camelidae-8x13B | 54.4% | 52.6% | 9.8% | 30.6% | 30.4% | 82.5% | 59.4% |
|
31 |
+
| LLaMA2-13B-chat | 54.6% | 37.1% | 5.2% | 18.9% | 27.2% | 81.9% | 55.0% |
|
32 |
+
| Camelidae-8x7B | 48.3% | 44.0% | 5.8% | 18.3% | 23.4% | 79.2% | 51.0% |
|
33 |
+
| LLaMA2-7B-chat | 48.3% | 26.3% | 3.9% | 12.2% | 17.6% | 78.6% | 46.4% |
|
34 |
+
|
35 |
+
We bold the highest scores for open-source models and all models separately.
|
36 |
+
|
37 |
+
|
38 |
+
## Usage
|
39 |
+
```python
|
40 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
41 |
+
|
42 |
+
# tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x7B", trust_remote_code=True)
|
43 |
+
# tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x13B", trust_remote_code=True)
|
44 |
+
tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)
|
45 |
+
|
46 |
+
# model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x7B", device_map="auto", trust_remote_code=True).eval()
|
47 |
+
# model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x13B", device_map="auto", trust_remote_code=True).eval()
|
48 |
+
model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()
|
49 |
+
|
50 |
+
inputs = tokenizer('### Human:\nHow are you?\n ### Assistant:\n', return_tensors='pt')
|
51 |
+
inputs = inputs.to(model.device)
|
52 |
+
pred = model.generate(**inputs)
|
53 |
+
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
54 |
+
```
|
55 |
+
|
56 |
+
## Citation
|
57 |
+
```bibtex
|
58 |
+
@article{wu2024parameter,
|
59 |
+
title={Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks},
|
60 |
+
author={Wu, Haoyuan and Zheng, Haisheng and Yu, Bei},
|
61 |
+
journal={arXiv preprint arXiv:2401.02731},
|
62 |
+
year={2024}
|
63 |
+
}
|
64 |
+
```
|
65 |
+
|
66 |
+
## License
|
67 |
+
The source code in this repo is licensed under the [Apache 2.0 License](https://github.com/wuhy68/Parameter-Efficient-MoE/blob/master/LICENSE). Camelidae models are developed for academic research and free commercial use, all usage must adhere to the license from [facebookresearch](https://github.com/facebookresearch/llama/blob/main/LICENSE) and [01-ai](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt).
|