LiqunMa
/

FBI-LLM_7B

Text Generation

text-generation-inference

Model card Files Files and versions

LiqunMa commited on Jul 6, 2024

Commit

69d5187

·

verified ·

1 Parent(s): 8b8d414

Update README.md

Files changed (1) hide show

README.md +35 -0

README.md CHANGED Viewed

	@@ -10,3 +10,38 @@ pipeline_tag: text-generation
10	---
11
12

 ---
+## How to use
+Please download the code from [LiqunMa/FBI-LLM](https://github.com/LiqunMa/FBI-LLM) firstly
+```
+from pathlib import Path
+from transformers import AutoTokenizer,LlamaConfig,LlamaForCausalLM, AutoModelForCausalLM
+from qat.replace_module import replace_with_learnable_binarylinear
+def load_model(model_size, model_dir):
+    assert model_size in ["130M", "1.3B", "7B"]
+    model_dir = Path(model_dir)
+    with Path(f'FBI-LLM_configs/FBI-LLM_llama2_{model_size}.json').open('r') as r_f:
+        config = json.load(r_f)
+    llama_config = LlamaConfig(**config)
+    model = LlamaForCausalLM(llama_config).to('cuda')
+    tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf', padding_side="right", use_fast=False)
+    if exist_extra_para:
+        model = replace_with_learnable_binarylinear(model, scaling_pattern = "column", keep_parts = ["lm_head"])
+    weight_dict = {}
+    ckpt_plist = [p for p in model_dir.iterdir() if p.suffix == '.bin']
+    for p in ckpt_plist:
+      weight_dict = torch.load(p)
+      for k,v in _weight_dict.items():
+          if 'self_attn.rotary_emb.inv_freq' not in k:
+              weight_dict[k] = v
+    model.load_state_dict(weight_dict)
+    for param in model.parameters():
+        param.data = param.data.to(torch.float16)
+    return model, tokenizer
+```