|
---
|
|
license: apache-2.0
|
|
---
|
|
<!--
|
|
* @Author: qiang gao gaoqiang[email protected]
|
|
* @Date: 2024-05-04 21:09:13
|
|
* @LastEditors: qiang gao gaoqiang_[email protected]
|
|
* @LastEditTime: 2024-05-05 08:43:45
|
|
* @FilePath: \llama3\hf\Llama3-8x8b-MoE-Instruct\README.md
|
|
* @Description:
|
|
-->
|
|
<p align="center">
|
|
<br>
|
|
<img src="./figures/llama3-MoE.jpg" width="800"/>
|
|
<br>
|
|
</p>
|
|
<!-- <p align="center">
|
|
<img alt="GitHub" src="https://img.shields.io/github/license/cooper12121/Llama3-8×8b-MoE .svg?color=blue&style=flat-square">
|
|
<img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/cooper12121/llama3-Chinese">
|
|
<img alt="GitHub top language" src="https://img.shields.io/github/languages/top/cooper12121/llama3-Chinese">
|
|
<a href="https://app.codacy.com/gh/cooper12121/llama3-Chinese/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a>
|
|
</p> -->
|
|
|
|
---
|
|
本项目基于Meta发布的[llama3-8B-Instruct模型](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Chat)进行开发。即将MLP复制8份,创建一个随机初始化的router,其余参数权重保持不变,搭建一个热启动的MoE模型。这种方式能够极大地降低从头开始训练一个MoE模型的成本,便于快速的在下游任务中微调使用。
|
|
---
|
|
|
|
> 其中 router_warmboot表示使用chines-mixtral-Instruct版本中的router参数进行llama3-MoE——Instruct参数的初始化,router_random是router随机初始化的版本。
|
|
|
|
**详情请见github仓库[https://github.com/cooper12121/llama3-8x8b-MoE](https://github.com/cooper12121/llama3-8x8b-MoE)**
|
|
|
|
**generate**
|
|
```python
|
|
import sys
|
|
sys.path.append("/apdcephfs_qy3/share_301372554/share_info/qianggao/")
|
|
|
|
from modeling_file.llama3_moe.modeling_llama_moe import LlamaMoEForCausalLM
|
|
from modeling_file.llama3_moe.tokenization_llama_fast import LlamaTokenizerFast
|
|
model_ckpt = "/apdcephfs_qy3/share_301372554/share_info/qianggao/ckpt/llama3-8x8b-MoE-base"
|
|
tokenizer = LlamaTokenizerFast.from_pretrained(model_ckpt)
|
|
# print(tokenizer)
|
|
model = LlamaMoEForCausalLM.from_pretrained(model_ckpt,device_map="auto",use_cache=False)
|
|
text_list = ["hello,what is your name?","你好,你叫什么名字"]
|
|
|
|
tokenizer.pad_token = tokenizer.eos_token
|
|
tokenizer.pad_token_id = tokenizer.eos_token_id
|
|
|
|
inputs = tokenizer(text_list,return_tensors="pt", padding=True).to("cuda")
|
|
|
|
output = model.generate(**inputs,pad_token_id=tokenizer.eos_token_id,max_new_tokens=100)
|
|
print(tokenizer.batch_decode(output))
|
|
```
|
|
**其中modeling_file文件可从github仓库获取** |