README.md · Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit at main

metadata

library_name: transformers
datasets:
  - bergr7f/databricks-dolly-15k-subset-general_qa
language:
  - en
base_model:
  - meta-llama/Llama-3.2-1B
pipeline_tag: text-generation

Model Description

Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware. Model Architecture

Base Model: Llama-3.2-1B
Parameters: Approximately 1 Billion
Quantization: 4-bit using the bitsandbytes library
Fine-tuning Method: PEFT with LoRA

Training Data

The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset.

Training Procedure

Fine-tuning Configuration:
    LoRA Rank (r): 8
    LoRA Alpha: 16
    LoRA Dropout: 0.5
    Number of Epochs: 30
    Batch Size: 2 (per device)
    Learning Rate: 2e-5
    Evaluation Strategy: Evaluated at each epoch
    Optimizer: AdamW
    Mixed Precision: FP16
Hardware Used: Single RTX 4070 8GB
Libraries:
    transformers
    datasets
    peft
    bitsandbytes
    trl
    evaluate

Intended Use

The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems.

Limitations and Biases

Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created. Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources. Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts.

Acknowledgements

Base Model: Meta AI's Llama-3.2-1B Dataset: Databricks Dolly 15k Subset for General QA Libraries Used:

Transformers

PEFT

TRL

BitsAndBytes

How to Use

  from transformers import AutoModelForCausalLM, AutoTokenizer
  from peft import PeftModel, PeftConfig
  
  peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit"
  config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')
  
  model = AutoModelForCausalLM.from_pretrained(
      config.base_model_name_or_path,
      device_map='auto',
      return_dict=True
  )
  
  tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
  tokenizer.pad_token = tokenizer.eos_token
  
  peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')

Inference the model

  def create_chat_template(question, context):
      text = f"""
          [Instruction] You are a question-answering agent which answers the question based on the related reviews. 
          If related reviews are not provided, you can generate the answer based on the question.\n
          [Question] {question}\n
          [Related Reviews] {context}\n
          [Answer]
      """
      return text
  
  def generate_response(question, context):
      text = create_chat_template(question, context)
      inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
      
      config = GenerationConfig(
          max_length=256,
          temperature=0.5,
          top_k=5,
          top_p=0.95,
          repetition_penalty=1.2,
          do_sample=True,
          penalty_alpha=0.6
      )
      
      response = model.generate(**inputs, generation_config=config)
      output = tokenizer.decode(response[0], skip_special_tokens=True)
      return output
  
  # Example usage
  question = "Explain the process of photosynthesis."
  response = generate_response(question)
  print(response)