bengsoon commited on
Commit
5494497
·
verified ·
1 Parent(s): bb0719d

Add quantization option

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -38,6 +38,8 @@ TEMPLATE = """<|begin_of_text|>Below is an instruction that describes a task, pa
38
  ```
39
 
40
  ### Inferencing using Transformers Pipeline
 
 
41
  ``` python
42
  import transformers
43
  import torch
@@ -82,4 +84,16 @@ output = pipeline(input)
82
 
83
  print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
84
  # > Response: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
 
 
 
 
 
 
 
 
 
 
 
 
85
  ```
 
38
  ```
39
 
40
  ### Inferencing using Transformers Pipeline
41
+ The code below was tested on a Google colab (with the free T4 GPU).
42
+
43
  ``` python
44
  import transformers
45
  import torch
 
84
 
85
  print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
86
  # > Response: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
87
+ ```
88
+
89
+ ### Quantized model
90
+ If you are facing GPU constraints, you can try to load it with 8-bit quantization
91
+
92
+ ``` python
93
+ pipeline = transformers.pipeline(
94
+ "text-generation",
95
+ model=model_id,
96
+ model_kwargs={"torch_dtype": torch.bfloat16, "load_in_8bit": True}, # Use 8-bit quantization
97
+ device_map="auto"
98
+ )
99
  ```