lrl-modelcloud commited on
Commit
9122269
1 Parent(s): dee986e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - gptq
4
+ - 4bit
5
+ - gptqmodel
6
+ - modelcloud
7
+ - llama-3.1
8
+ - 8b
9
+ ---
10
+ This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).
11
+
12
+ - **bits**: 4
13
+ - **group_size**: 128
14
+ - **desc_act**: true
15
+ - **static_groups**: false
16
+ - **sym**: true
17
+ - **lm_head**: false
18
+ - **damp_percent**: 0.01
19
+ - **true_sequential**: true
20
+ - **model_name_or_path**: ""
21
+ - **model_file_base_name**: "model"
22
+ - **quant_method**: "gptq"
23
+ - **checkpoint_format**: "gptq"
24
+ - **meta**:
25
+ - **quantizer**: "gptqmodel:0.9.9-dev0"
26
+
27
+ **Here is an example:**
28
+ ```python
29
+ import torch
30
+ from transformers import AutoTokenizer
31
+ from gptqmodel import GPTQModel
32
+
33
+ device = torch.device("cuda:0")
34
+
35
+ model_name = "ModelCloud/Meta-Llama-3.1-8B-gptq-4bit"
36
+
37
+ prompt = "I am in Shanghai, preparing to visit the natural history museum. Can you tell me the best way to"
38
+
39
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
40
+
41
+ model = GPTQModel.from_quantized(model_name)
42
+
43
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
44
+ res = model.generate(**inputs, num_beams=1, min_new_tokens=1, max_new_tokens=512)
45
+ print(tokenizer.decode(res[0]))
46
+ ```