Add quantization option
Browse files
README.md
CHANGED
@@ -38,6 +38,8 @@ TEMPLATE = """<|begin_of_text|>Below is an instruction that describes a task, pa
|
|
38 |
```
|
39 |
|
40 |
### Inferencing using Transformers Pipeline
|
|
|
|
|
41 |
``` python
|
42 |
import transformers
|
43 |
import torch
|
@@ -82,4 +84,16 @@ output = pipeline(input)
|
|
82 |
|
83 |
print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
|
84 |
# > Response: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
```
|
|
|
38 |
```
|
39 |
|
40 |
### Inferencing using Transformers Pipeline
|
41 |
+
The code below was tested on a Google colab (with the free T4 GPU).
|
42 |
+
|
43 |
``` python
|
44 |
import transformers
|
45 |
import torch
|
|
|
84 |
|
85 |
print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip())
|
86 |
# > Response: Packed equipment and prepared for backload. Cleaned drillfloor and cantilever. Performed are inspection with barge engineer. Cleaned and tidyied offices and workspaces.
|
87 |
+
```
|
88 |
+
|
89 |
+
### Quantized model
|
90 |
+
If you are facing GPU constraints, you can try to load it with 8-bit quantization
|
91 |
+
|
92 |
+
``` python
|
93 |
+
pipeline = transformers.pipeline(
|
94 |
+
"text-generation",
|
95 |
+
model=model_id,
|
96 |
+
model_kwargs={"torch_dtype": torch.bfloat16, "load_in_8bit": True}, # Use 8-bit quantization
|
97 |
+
device_map="auto"
|
98 |
+
)
|
99 |
```
|