bengsoon
/

DriLLM-Summarizer

@@ -20,7 +20,7 @@ This is a fine-tuned model from [Meta/Meta-Llama-3-8B](https://huggingface.co/Me
 The motivation behind this model was to fine-tune an LLM that is capable of understanding the nuances of the Drilling Operations and provide 24-hour summarizations based on the inputs from Daily Drilling Reports hourly activities.
 ## How to use
-### Recommended template for DriLLM summarization:
 ``` python
 TEMPLATE = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
@@ -90,10 +90,13 @@ print("Response: ", output[0]["generated_text"].split("### Response:")[1].strip(
 If you are facing GPU constraints, you can try to load it with 8-bit quantization
 ``` python
   pipeline = transformers.pipeline(
-      "text-generation",
-      model=model_id,
-      model_kwargs={"torch_dtype": torch.bfloat16, "load_in_8bit": True},  # Use 8-bit quantization
       device_map="auto"
   )
 ```

 The motivation behind this model was to fine-tune an LLM that is capable of understanding the nuances of the Drilling Operations and provide 24-hour summarizations based on the inputs from Daily Drilling Reports hourly activities.
 ## How to use
+### Recommended template for DriLLM-Summarizer:
 ``` python
 TEMPLATE = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
 If you are facing GPU constraints, you can try to load it with 8-bit quantization
 ``` python
+from transformers import BitsAndBytesConfig
   pipeline = transformers.pipeline(
+      "text-generation",
+      model=model_id,
+      torch_dtype = torch.bfloat16,
+      quantization_config=BitsAndBytesConfig(load_in_8bit=True), # 8-bit quantization,
       device_map="auto"
   )
 ```