YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)




本项目基于Meta发布的llama3-8B模型进行开发。即将MLP复制8份,创建一个随机初始化的router,其余参数权重保持不变,搭建一个热启动的MoE模型。这种方式能够极大地降低从头开始训练一个MoE模型的成本,便于快速的在下游任务中微调使用。

其中 router_warmboot表示使用chines-mixtral [https://github.com/ymcui/Chinese-Mixtral] (https://github.com/ymcui/Chinese-Mixtral)版本中的router参数进行llama3-MoE-base参数的初始化,router_random是router随机初始化的版本。

详情请见github仓库https://github.com/cooper12121/llama3-8x8b-MoE

generate

  import sys
  sys.path.append("/apdcephfs_qy3/share_301372554/share_info/qianggao/")

  from modeling_file.llama3_moe.modeling_llama_moe import LlamaMoEForCausalLM
  from modeling_file.llama3_moe.tokenization_llama_fast import LlamaTokenizerFast
  model_ckpt = "/apdcephfs_qy3/share_301372554/share_info/qianggao/ckpt/llama3-8x8b-MoE-base"
  tokenizer = LlamaTokenizerFast.from_pretrained(model_ckpt)
  # print(tokenizer)
  model = LlamaMoEForCausalLM.from_pretrained(model_ckpt,device_map="auto",use_cache=False)
  text_list = ["hello,what is your name?","你好,你叫什么名字"]
  
  tokenizer.pad_token = tokenizer.eos_token
  tokenizer.pad_token_id = tokenizer.eos_token_id

  inputs = tokenizer(text_list,return_tensors="pt", padding=True).to("cuda")
 
  output = model.generate(**inputs,pad_token_id=tokenizer.eos_token_id,max_new_tokens=100)
  print(tokenizer.batch_decode(output))

其中modeling_file文件可从github仓库获取

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .