lrl-modelcloud commited on
Commit
30b75ca
·
verified ·
1 Parent(s): 72c3056

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ language:
4
+ - en
5
+ - de
6
+ - fr
7
+ - it
8
+ - pt
9
+ - hi
10
+ - es
11
+ - th
12
+ base_model:
13
+ - meta-llama/Llama-3.2-1B-Instruct
14
+ pipeline_tag: text-generation
15
+ tags:
16
+ - gptqmodel
17
+ - modelcloud
18
+ - llama3.2
19
+ - instruct
20
+ - int4
21
+ ---
22
+ This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).
23
+
24
+ - **bits**: 4
25
+ - **dynamic**: null
26
+ - **group_size**: 32
27
+ - **desc_act**: true
28
+ - **static_groups**: false
29
+ - **sym**: true
30
+ - **lm_head**: false
31
+ - **true_sequential**: true
32
+ - **quant_method**: "gptq"
33
+ - **checkpoint_format**: "gptq"
34
+ - **meta**:
35
+ - **quantizer**: gptqmodel:1.1.0
36
+ - **uri**: https://github.com/modelcloud/gptqmodel
37
+ - **damp_percent**: 0.1
38
+ - **damp_auto_increment**: 0.0015
39
+
40
+
41
+ ## Example:
42
+ ```python
43
+ from transformers import AutoTokenizer
44
+ from gptqmodel import GPTQModel
45
+
46
+ model_name = "ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortext-v1"
47
+
48
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
49
+ model = GPTQModel.from_quantized(model_name)
50
+
51
+ messages = [
52
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
53
+ {"role": "user", "content": "Who are you?"},
54
+ ]
55
+ input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
56
+
57
+ outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=512)
58
+ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
59
+
60
+ print(result)
61
+ ```