File size: 1,078 Bytes
b5edd29
 
 
 
 
593e89f
 
8ab4ab4
593e89f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: other
license_name: internlm-license
license_link: https://huggingface.co/internlm/internlm-chat-7b-v1_1
---

internlm-chat-7b-v1_1をGPTQ変換したモデルです<br>
利用に当たってはhttps://huggingface.co/internlm/internlm-chat-7b-v1_1 のライセンスに従って下さい<br>
<br>
推論用コード<br>
```
import torch
import time
from transformers import AutoTokenizer, AutoModelForCausalLM,GPTQConfig

model_path = r".\internlm-chat-7b-v1_1-gptq"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

gptq_config = GPTQConfig(bits= 4 , disable_exllama= True )
model = AutoModelForCausalLM.from_pretrained( model_path , device_map= "auto" , quantization_config = gptq_config,trust_remote_code=True)
model = model.eval()

history = []
  
while True:
    txt = input("msg:")
    start_time = time.perf_counter()
    response, history = model.chat(tokenizer, txt, history=history)
    print(response)
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"worktime:{elapsed_time}")
   
```