Sandiago21
/

llama-13b-hf-prompt-answering

@@ -20,6 +20,7 @@ This repository contains a LLaMA-13B further fine-tuned model on conversations a
 ## Model Details
 ### Model Description
@@ -95,23 +96,91 @@ def generate_prompt(instruction: str, input_ctxt: str = None) -> str:
 Use the code below to get started with the model.
 ```python
 import torch
 from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
-tokenizer = LlamaTokenizer.from_pretrained("Sandiago21/llama-13b-hf-prompt-answering")
 model = LlamaForCausalLM.from_pretrained(
-    "Sandiago21/llama-13b-hf-prompt-answering",
     load_in_8bit=True,
     torch_dtype=torch.float16,
     device_map="auto",
 )
 generation_config = GenerationConfig(
     temperature=0.2,
     top_p=0.75,
     top_k=40,
     num_beams=4,
-    max_new_tokens=128,
 )
 model.eval()
@@ -139,7 +208,7 @@ with torch.no_grad():
 response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
 print(response)
->>> The capital city of Greece is Athens and it borders Albania, Macedonia, Bulgaria and Turkey.
 ```
 ## Training Details

 ## Model Details
+Anyone can use (ask prompts) and play with the model using the pre-existing Jupyter Notebook in the **noteboooks** folder.
 ### Model Description
 Use the code below to get started with the model.
+1. You can git clone the repo, which contains also the artifacts for the base model for simplicity and completeness, and run the following code snippet to load the mode:
 ```python
 import torch
 from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
+MODEL_NAME = "Sandiago21/llama-7b-hf-prompt-answering"
+config = PeftConfig.from_pretrained(MODEL_NAME)
 model = LlamaForCausalLM.from_pretrained(
+    config.base_model_name_or_path,
     load_in_8bit=True,
     torch_dtype=torch.float16,
     device_map="auto",
 )
+tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME)
+model = PeftModel.from_pretrained(model, MODEL_NAME)
+generation_config = GenerationConfig(
+    temperature=0.2,
+    top_p=0.75,
+    top_k=40,
+    num_beams=4,
+    max_new_tokens=32,
+)
+model.eval()
+if torch.__version__ >= "2":
+    model = torch.compile(model)
+```
+### Example of Usage
+```python
+instruction = "What is the capital city of Greece and with which countries does Greece border?"
+input_ctxt = None  # For some tasks, you can provide an input context to help the model generate a better response.
+prompt = generate_prompt(instruction, input_ctxt)
+input_ids = tokenizer(prompt, return_tensors="pt").input_ids
+input_ids = input_ids.to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        input_ids=input_ids,
+        generation_config=generation_config,
+        return_dict_in_generate=True,
+        output_scores=True,
+    )
+response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
+print(response)
+>>> The capital city of Greece is Athens and it borders Turkey, Bulgaria, Macedonia, Albania, and the Aegean Sea.
+```
+2. You can also directly call the model from HuggingFace using the following code snippet:
+```python
+import torch
+from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
+MODEL_NAME = "Sandiago21/llama-7b-hf-prompt-answering"
+BASE_MODEL = "decapoda-research/llama-7b-hf
+config = PeftConfig.from_pretrained(MODEL_NAME)
+model = LlamaForCausalLM.from_pretrained(
+    BASE_MODEL,
+    load_in_8bit=True,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME)
+model = PeftModel.from_pretrained(model, MODEL_NAME)
 generation_config = GenerationConfig(
     temperature=0.2,
     top_p=0.75,
     top_k=40,
     num_beams=4,
+    max_new_tokens=32,
 )
 model.eval()
 response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
 print(response)
+>>> The capital city of Greece is Athens and it borders Turkey, Bulgaria, Macedonia, Albania, and the Aegean Sea.
 ```
 ## Training Details