File size: 2,737 Bytes
c3cd3a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---

license: apache-2.0
---

<!--
 * @Author: qiang gao gaoqiang[email protected]

 * @Date: 2024-05-04 21:09:13

 * @LastEditors: qiang gao gaoqiang_[email protected]
 * @LastEditTime: 2024-05-05 08:43:45
 * @FilePath: \llama3\hf\Llama3-8x8b-MoE-Instruct\README.md
 * @Description:
-->
<p align="center">
    <br>

    <img src="./figures/llama3-MoE.jpg" width="800"/>

    <br>

</p>

<!-- <p align="center">

    <img alt="GitHub" src="https://img.shields.io/github/license/cooper12121/Llama3-8×8b-MoE .svg?color=blue&style=flat-square">

    <img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/cooper12121/llama3-Chinese">

    <img alt="GitHub top language" src="https://img.shields.io/github/languages/top/cooper12121/llama3-Chinese">

    <a href="https://app.codacy.com/gh/cooper12121/llama3-Chinese/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a>

</p> -->


---
本项目基于Meta发布的[llama3-8B-Instruct模型](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Chat)进行开发。即将MLP复制8份,创建一个随机初始化的router,其余参数权重保持不变,搭建一个热启动的MoE模型。这种方式能够极大地降低从头开始训练一个MoE模型的成本,便于快速的在下游任务中微调使用。
---

> 其中 router_warmboot表示使用chines-mixtral-Instruct版本中的router参数进行llama3-MoE——Instruct参数的初始化,router_random是router随机初始化的版本。

**详情请见github仓库[https://github.com/cooper12121/llama3-8x8b-MoE](https://github.com/cooper12121/llama3-8x8b-MoE)**

**generate**
```python

  import sys

  sys.path.append("/apdcephfs_qy3/share_301372554/share_info/qianggao/")



  from modeling_file.llama3_moe.modeling_llama_moe import LlamaMoEForCausalLM

  from modeling_file.llama3_moe.tokenization_llama_fast import LlamaTokenizerFast

  model_ckpt = "/apdcephfs_qy3/share_301372554/share_info/qianggao/ckpt/llama3-8x8b-MoE-base"

  tokenizer = LlamaTokenizerFast.from_pretrained(model_ckpt)

  # print(tokenizer)

  model = LlamaMoEForCausalLM.from_pretrained(model_ckpt,device_map="auto",use_cache=False)

  text_list = ["hello,what is your name?","你好,你叫什么名字"]

  

  tokenizer.pad_token = tokenizer.eos_token

  tokenizer.pad_token_id = tokenizer.eos_token_id



  inputs = tokenizer(text_list,return_tensors="pt", padding=True).to("cuda")

 

  output = model.generate(**inputs,pad_token_id=tokenizer.eos_token_id,max_new_tokens=100)

  print(tokenizer.batch_decode(output))

```
**其中modeling_file文件可从github仓库获取**