gao-NLP
/

Llama3-8x8b-MoE-Instruct

Model card Files Files and versions Community

Llama3-8x8b-MoE-Instruct / README.md

gao-NLP's picture

upload first verison of model

c3cd3a4 verified 8 months ago

|

2.74 kB

	---
	license: apache-2.0
	---
	<!--
	* @Author: qiang gao gaoqiang[email protected]
	* @Date: 2024-05-04 21:09:13
	* @LastEditors: qiang gao gaoqiang_[email protected]
	* @LastEditTime: 2024-05-05 08:43:45
	* @FilePath: \llama3\hf\Llama3-8x8b-MoE-Instruct\README.md
	* @Description:
	-->
	<p align="center">
	<br>
	<img src="./figures/llama3-MoE.jpg" width="800"/>
	<br>
	</p>
	<!-- <p align="center">
	<img alt="GitHub" src="https://img.shields.io/github/license/cooper12121/Llama3-8×8b-MoE .svg?color=blue&style=flat-square">
	<img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/cooper12121/llama3-Chinese">
	<img alt="GitHub top language" src="https://img.shields.io/github/languages/top/cooper12121/llama3-Chinese">
	<a href="https://app.codacy.com/gh/cooper12121/llama3-Chinese/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a>
	</p> -->

	---
	本项目基于Meta发布的[llama3-8B-Instruct模型](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Chat)进行开发。即将MLP复制8份，创建一个随机初始化的router，其余参数权重保持不变，搭建一个热启动的MoE模型。这种方式能够极大地降低从头开始训练一个MoE模型的成本，便于快速的在下游任务中微调使用。
	---

	> 其中 router_warmboot表示使用chines-mixtral-Instruct版本中的router参数进行llama3-MoE——Instruct参数的初始化，router_random是router随机初始化的版本。

	详情请见github仓库[https://github.com/cooper12121/llama3-8x8b-MoE](https://github.com/cooper12121/llama3-8x8b-MoE)

	generate
	```python
	import sys
	sys.path.append("/apdcephfs_qy3/share_301372554/share_info/qianggao/")

	from modeling_file.llama3_moe.modeling_llama_moe import LlamaMoEForCausalLM
	from modeling_file.llama3_moe.tokenization_llama_fast import LlamaTokenizerFast
	model_ckpt = "/apdcephfs_qy3/share_301372554/share_info/qianggao/ckpt/llama3-8x8b-MoE-base"
	tokenizer = LlamaTokenizerFast.from_pretrained(model_ckpt)
	# print(tokenizer)
	model = LlamaMoEForCausalLM.from_pretrained(model_ckpt,device_map="auto",use_cache=False)
	text_list = ["hello,what is your name?","你好，你叫什么名字"]

	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.pad_token_id = tokenizer.eos_token_id

	inputs = tokenizer(text_list,return_tensors="pt", padding=True).to("cuda")

	output = model.generate(**inputs,pad_token_id=tokenizer.eos_token_id,max_new_tokens=100)
	print(tokenizer.batch_decode(output))
	```
	其中modeling_file文件可从github仓库获取