File size: 2,737 Bytes
c3cd3a4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: apache-2.0
---
<!--
* @Author: qiang gao gaoqiang[email protected]
* @Date: 2024-05-04 21:09:13
* @LastEditors: qiang gao gaoqiang_[email protected]
* @LastEditTime: 2024-05-05 08:43:45
* @FilePath: \llama3\hf\Llama3-8x8b-MoE-Instruct\README.md
* @Description:
-->
<p align="center">
<br>
<img src="./figures/llama3-MoE.jpg" width="800"/>
<br>
</p>
<!-- <p align="center">
<img alt="GitHub" src="https://img.shields.io/github/license/cooper12121/Llama3-8×8b-MoE .svg?color=blue&style=flat-square">
<img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/cooper12121/llama3-Chinese">
<img alt="GitHub top language" src="https://img.shields.io/github/languages/top/cooper12121/llama3-Chinese">
<a href="https://app.codacy.com/gh/cooper12121/llama3-Chinese/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a>
</p> -->
---
本项目基于Meta发布的[llama3-8B-Instruct模型](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Chat)进行开发。即将MLP复制8份,创建一个随机初始化的router,其余参数权重保持不变,搭建一个热启动的MoE模型。这种方式能够极大地降低从头开始训练一个MoE模型的成本,便于快速的在下游任务中微调使用。
---
> 其中 router_warmboot表示使用chines-mixtral-Instruct版本中的router参数进行llama3-MoE——Instruct参数的初始化,router_random是router随机初始化的版本。
**详情请见github仓库[https://github.com/cooper12121/llama3-8x8b-MoE](https://github.com/cooper12121/llama3-8x8b-MoE)**
**generate**
```python
import sys
sys.path.append("/apdcephfs_qy3/share_301372554/share_info/qianggao/")
from modeling_file.llama3_moe.modeling_llama_moe import LlamaMoEForCausalLM
from modeling_file.llama3_moe.tokenization_llama_fast import LlamaTokenizerFast
model_ckpt = "/apdcephfs_qy3/share_301372554/share_info/qianggao/ckpt/llama3-8x8b-MoE-base"
tokenizer = LlamaTokenizerFast.from_pretrained(model_ckpt)
# print(tokenizer)
model = LlamaMoEForCausalLM.from_pretrained(model_ckpt,device_map="auto",use_cache=False)
text_list = ["hello,what is your name?","你好,你叫什么名字"]
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(text_list,return_tensors="pt", padding=True).to("cuda")
output = model.generate(**inputs,pad_token_id=tokenizer.eos_token_id,max_new_tokens=100)
print(tokenizer.batch_decode(output))
```
**其中modeling_file文件可从github仓库获取** |