---
base_model:
- meta-llama/Llama-3.2-1B
- Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit
language:
- en
library_name: transformers
pipeline_tag: question-answering
tags:
- Amazon
- Question-Answering
- PEFT
- Supervised-Finetuning
---

## Model details
Model Name: Llama-3.2-1B-finetuned-amazon-reviews-QA-peft-4bit<br>
Author: Chryslerx10<br>
Base Model: Llama-3.2-1B<br>
Task: Product Question Answering (QA) based on customer reviews<br>
Framework: Hugging Face Transformers<br>
PEFT Framework: PEFT with LoRA fine-tuning<br>
Quantization: 4-bit with BitsAndBytes for efficient deployment<br>

## Model Description
This model is fine-tuned on a dataset of product-related QA pairs generated from customer reviews. The model is designed to assist users by answering questions about products based on relevant review data. It leverages LoRA (Low-Rank Adaptation) for efficient parameter fine-tuning and quantization for deployment on resource-constrained devices.

### Key features
<ul>
  <li>Instruction Tuning: Fine-tuned with clear step-by-step instructions for generating accurate, user-friendly answers.</li>
  <li>4-bit Quantization: Optimized for efficient inference with low memory usage.</li>
  <li>LoRA Fine-tuning: Enables effective fine-tuning with fewer resources while maintaining performance.</li>
  <li>Conversational Tone: Provides responses that are conversational, relevant, and easy to understand.</li>
</ul>

## Limitations
<ul>
  <li>Product-related Questions Only: The model is trained to respond to questions about products. It politely informs the user when questions are unrelated.</li>
  <li>Review Dependency: If sufficient review data is unavailable, the model acknowledges the limitation in its response.</li>
</ul>
    
## Training Details
### Dataset
The model was trained on <a href="https://www.kaggle.com/datasets/cynthiarempel/amazon-us-customer-reviews-dataset">Amazon reviews dataset</a> of product reviews from the beauty category. The dataset was preprocessed into a structured format with the following fields:
<ul>
  <li>Question: User question. Derived by using sentiment analysis on review segments, and assigning a basic question from a pool.</li>
  <li>Related Reviews: Context containing relevant product reviews. Set of reviews sharing the same sentiment and question.</li>
  <li>Answer: Generated using BART model. Summaries of the reviews in related reviews, added as context to the model</li>
</ul>

### Fine tuning configuration
<ul>
  <li>Batch Size: 2 per device (train/eval)</li>
  <li>Epochs: 30</li>
  <li>Learning Rate: 2e-5</li>
  <li>Evaluation Strategy: Per epoch</li>
  <li>Save Strategy: Epoch-based, with a limit of 2 checkpoints.</li>
</ul>

### PEFT configuration
<ul>
  <li>Adapter Type: LoRA</li>
  <li>LoRA Rank (r): 8</li>
  <li>Alpha: 16</li>
  <li>Dropout: 0.5</li>
</ul>

### Quantization configuration
<ul>
  <li>Quantization Type: NF4</li>
  <li>Compute Dtype: Float16</li>
  <li>Double Quantization: Enabled</li>
</ul>


## Inference

### Loading the model
```python
    from transformers import AutoModelForCausalLM, AutoTokenizer
    from peft import PeftModel, PeftConfig
    
    peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-amazon-reviews-QA-peft-4bit"
    config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')
    
    model = AutoModelForCausalLM.from_pretrained(
        config.base_model_name_or_path,
        device_map='auto',
        return_dict=True
    )
    
    tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
    tokenizer.pad_token = tokenizer.eos_token
    
    peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
```

### Generating outputs
```python
  def create_chat_template(question, context):
    text = f"""[Instruction] You are a question-answering agent specialized in helping users with their queries about products based on relevant customer reviews. Your job is to analyze the reviews provided in the context and generate an accurate, helpful, and informative response to the question asked.

    1. Read the user's question carefully.
    2. Use the reviews given in the context to formulate your answer.
    3. If the product reviews don't contain enough information or is missing, inform the user that there aren't sufficient reviews to answer the question.
    4. If the question is unrelated to products, politely inform the user that you can only assist with product-related queries.
    5. Structure your response in a conversational and user-friendly manner. 

    Your goal is to provide helpful and contextually relevant answers to product-related questions.

    [Question]\n {row['question']}

    [Related Reviews]\n {row['review'] if row['review'] else ''}

    [Answer]\n {row['summary']}"""
  return text

  def generate_response(question, context):
      text = create_chat_template(question, context)
      inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
      
      config = GenerationConfig(
          max_length=256,
          temperature=0.5,
          top_k=5,
          top_p=0.95,
          repetition_penalty=1.2,
          do_sample=True,
          penalty_alpha=0.6
      )
      
      response = model.generate(**inputs, generation_config=config)
      output = tokenizer.decode(response[0], skip_special_tokens=True)
      return output
  
  # Example usage
  question = "How is the battery life of this product?"
  response = generate_response(question)
  print(response)
```