Chryslerx10's picture
Update README.md
973591d verified
|
raw
history blame
4.25 kB
metadata
library_name: transformers
tags: []

Model Description

Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware. Model Architecture

Base Model: Llama-3.2-1B
Parameters: Approximately 1 Billion
Quantization: 4-bit using the bitsandbytes library
Fine-tuning Method: PEFT with LoRA

Training Data

The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset.

Training Procedure

Fine-tuning Configuration:
    LoRA Rank (r): 8
    LoRA Alpha: 16
    LoRA Dropout: 0.5
    Number of Epochs: 30
    Batch Size: 2 (per device)
    Learning Rate: 2e-5
    Evaluation Strategy: Evaluated at each epoch
    Optimizer: AdamW
    Mixed Precision: FP16
Hardware Used: [Specify hardware if known, e.g., "Single NVIDIA A100 GPU"]
Libraries:
    transformers
    datasets
    peft
    bitsandbytes
    trl
    evaluate

Intended Use

The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems.

Limitations and Biases

Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created. Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources. Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts.

Acknowledgements

Base Model: Meta AI's Llama-3.2-1B Dataset: Databricks Dolly 15k Subset for General QA Libraries Used:

  • Transformers
  • PEFT
  • TRL
  • BitsAndBytes
  • How to Use

      from transformers import AutoModelForCausalLM, AutoTokenizer
      from peft import PeftModel, PeftConfig
      
      peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit"
      config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')
      
      model = AutoModelForCausalLM.from_pretrained(
          config.base_model_name_or_path,
          device_map='auto',
          return_dict=True
      )
      
      tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
      tokenizer.pad_token = tokenizer.eos_token
      
      peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
    

    Inference the model

      def create_chat_template(question, context):
          text = (
              "[Instruction] You are a question-answering agent which answers the question based on the related reviews. "
              "If related reviews are not provided, you can generate the answer based on the question.\n"
              f"[Question] {question}\n"
              "[Related Reviews] {context}\n"
              "[Answer] "
          )
          return text
      
      def generate_response(question, context):
          text = create_chat_template(question, context)
          inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
          
          config = GenerationConfig(
              max_length=256,
              temperature=0.5,
              top_k=5,
              top_p=0.95,
              repetition_penalty=1.2,
              do_sample=True,
              penalty_alpha=0.6
          )
          
          response = model.generate(**inputs, generation_config=config)
          output = tokenizer.decode(response[0], skip_special_tokens=True)
          return output
      
      # Example usage
      question = "Explain the process of photosynthesis."
      response = generate_response(question)
      print(response)