|
--- |
|
library_name: transformers |
|
datasets: |
|
- bergr7f/databricks-dolly-15k-subset-general_qa |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.2-1B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
## Model Description |
|
|
|
Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware. |
|
Model Architecture |
|
|
|
Base Model: Llama-3.2-1B |
|
Parameters: Approximately 1 Billion |
|
Quantization: 4-bit using the bitsandbytes library |
|
Fine-tuning Method: PEFT with LoRA |
|
|
|
## Training Data |
|
|
|
The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset. |
|
|
|
### Training Procedure |
|
|
|
Fine-tuning Configuration: |
|
LoRA Rank (r): 8 |
|
LoRA Alpha: 16 |
|
LoRA Dropout: 0.5 |
|
Number of Epochs: 30 |
|
Batch Size: 2 (per device) |
|
Learning Rate: 2e-5 |
|
Evaluation Strategy: Evaluated at each epoch |
|
Optimizer: AdamW |
|
Mixed Precision: FP16 |
|
Hardware Used: Single RTX 4070 8GB |
|
Libraries: |
|
transformers |
|
datasets |
|
peft |
|
bitsandbytes |
|
trl |
|
evaluate |
|
|
|
## Intended Use |
|
|
|
The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems. |
|
|
|
## Limitations and Biases |
|
|
|
Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created. |
|
Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources. |
|
Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts. |
|
|
|
|
|
## Acknowledgements |
|
|
|
Base Model: <a href="https://huggingface.co/meta-llama/Llama-3.2-1B">Meta AI's Llama-3.2-1B </a> |
|
Dataset: <a href="https://huggingface.co/datasets/bergr7f/databricks-dolly-15k-subset-general_qa">Databricks Dolly 15k Subset for General QA</a> |
|
Libraries Used: |
|
<li>Transformers</li> |
|
<li>PEFT</li> |
|
<li>TRL</li> |
|
<li>BitsAndBytes</li> |
|
|
|
|
|
## How to Use |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel, PeftConfig |
|
|
|
peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit" |
|
config = PeftConfig.from_pretrained(peft_model_id, device_map='auto') |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
config.base_model_name_or_path, |
|
device_map='auto', |
|
return_dict=True |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(peft_model_id) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto') |
|
``` |
|
|
|
## Inference the model |
|
```python |
|
def create_chat_template(question, context): |
|
text = f""" |
|
[Instruction] You are a question-answering agent which answers the question based on the related reviews. |
|
If related reviews are not provided, you can generate the answer based on the question.\n |
|
[Question] {question}\n |
|
[Related Reviews] {context}\n |
|
[Answer] |
|
""" |
|
return text |
|
|
|
def generate_response(question, context): |
|
text = create_chat_template(question, context) |
|
inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device) |
|
|
|
config = GenerationConfig( |
|
max_length=256, |
|
temperature=0.5, |
|
top_k=5, |
|
top_p=0.95, |
|
repetition_penalty=1.2, |
|
do_sample=True, |
|
penalty_alpha=0.6 |
|
) |
|
|
|
response = model.generate(**inputs, generation_config=config) |
|
output = tokenizer.decode(response[0], skip_special_tokens=True) |
|
return output |
|
|
|
# Example usage |
|
question = "Explain the process of photosynthesis." |
|
response = generate_response(question) |
|
print(response) |
|
``` |