Chryslerx10
/

Llama-3.2-1B-finetuned-generalQA-peft-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Chryslerx10 commited on Oct 9, 2024

Commit

7804c76

·

verified ·

1 Parent(s): ed8c139

Update README.md

Files changed (1) hide show

README.md +36 -0

README.md CHANGED Viewed

@@ -78,4 +78,40 @@ Libraries Used:
   tokenizer.pad_token = tokenizer.eos_token
   peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
 ```

   tokenizer.pad_token = tokenizer.eos_token
   peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
+```
+## Inference the model
+```python
+  def create_chat_template(question):
+      text = (
+          "[Instruction] You are a question-answering agent which answers the question based on the related reviews. "
+          "If related reviews are not provided, you can generate the answer based on the question.\n"
+          f"[Question] {question}\n"
+          "[Related Reviews] {context}\n"
+          "[Answer] "
+      )
+      return text
+  def generate_response(question, context):
+      text = create_chat_template(question, context)
+      inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
+      config = GenerationConfig(
+          max_length=256,
+          temperature=0.5,
+          top_k=5,
+          top_p=0.95,
+          repetition_penalty=1.2,
+          do_sample=True,
+          penalty_alpha=0.6
+      )
+      response = model.generate(**inputs, generation_config=config)
+      output = tokenizer.decode(response[0], skip_special_tokens=True)
+      return output
+  # Example usage
+  question = "Explain the process of photosynthesis."
+  response = generate_response(question)
+  print(response)
 ```