File size: 4,379 Bytes
45d404a bd02fdc 45d404a 74a2e21 45d404a 74a2e21 45d404a 74a2e21 45d404a 74a2e21 45d404a 74a2e21 45d404a 74a2e21 3ce5092 74a2e21 3ce5092 208f836 7f01b78 ed8c139 7f01b78 7804c76 973591d dfe503c 7804c76 ed8c139 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
library_name: transformers
datasets:
- bergr7f/databricks-dolly-15k-subset-general_qa
language:
- en
base_model:
- meta-llama/Llama-3.2-1B
pipeline_tag: text-generation
---
## Model Description
Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware.
Model Architecture
Base Model: Llama-3.2-1B
Parameters: Approximately 1 Billion
Quantization: 4-bit using the bitsandbytes library
Fine-tuning Method: PEFT with LoRA
## Training Data
The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset.
### Training Procedure
Fine-tuning Configuration:
LoRA Rank (r): 8
LoRA Alpha: 16
LoRA Dropout: 0.5
Number of Epochs: 30
Batch Size: 2 (per device)
Learning Rate: 2e-5
Evaluation Strategy: Evaluated at each epoch
Optimizer: AdamW
Mixed Precision: FP16
Hardware Used: [Specify hardware if known, e.g., "Single NVIDIA A100 GPU"]
Libraries:
transformers
datasets
peft
bitsandbytes
trl
evaluate
## Intended Use
The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems.
## Limitations and Biases
Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created.
Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources.
Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts.
## Acknowledgements
Base Model: <a href="https://huggingface.co/meta-llama/Llama-3.2-1B">Meta AI's Llama-3.2-1B </a>
Dataset: <a href="https://huggingface.co/datasets/bergr7f/databricks-dolly-15k-subset-general_qa">Databricks Dolly 15k Subset for General QA</a>
Libraries Used:
<li>Transformers</li>
<li>PEFT</li>
<li>TRL</li>
<li>BitsAndBytes</li>
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit"
config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
device_map='auto',
return_dict=True
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
tokenizer.pad_token = tokenizer.eos_token
peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
```
## Inference the model
```python
def create_chat_template(question, context):
text = f"""
[Instruction] You are a question-answering agent which answers the question based on the related reviews.
If related reviews are not provided, you can generate the answer based on the question.\n
[Question] {question}\n
[Related Reviews] {context}\n
[Answer]
"""
return text
def generate_response(question, context):
text = create_chat_template(question, context)
inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
config = GenerationConfig(
max_length=256,
temperature=0.5,
top_k=5,
top_p=0.95,
repetition_penalty=1.2,
do_sample=True,
penalty_alpha=0.6
)
response = model.generate(**inputs, generation_config=config)
output = tokenizer.decode(response[0], skip_special_tokens=True)
return output
# Example usage
question = "Explain the process of photosynthesis."
response = generate_response(question)
print(response)
``` |