File size: 4,379 Bytes
45d404a
 
bd02fdc
 
 
 
 
 
 
45d404a
 
74a2e21
45d404a
74a2e21
 
45d404a
74a2e21
 
 
 
45d404a
74a2e21
45d404a
74a2e21
45d404a
 
 
74a2e21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ce5092
 
 
74a2e21
 
 
 
3ce5092
 
 
208f836
 
 
7f01b78
 
 
 
ed8c139
7f01b78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7804c76
 
 
 
973591d
dfe503c
 
 
 
 
 
 
7804c76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed8c139
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
library_name: transformers
datasets:
- bergr7f/databricks-dolly-15k-subset-general_qa
language:
- en
base_model:
- meta-llama/Llama-3.2-1B
pipeline_tag: text-generation
---

## Model Description

Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware.
Model Architecture

    Base Model: Llama-3.2-1B
    Parameters: Approximately 1 Billion
    Quantization: 4-bit using the bitsandbytes library
    Fine-tuning Method: PEFT with LoRA

## Training Data

The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset.

### Training Procedure

    Fine-tuning Configuration:
        LoRA Rank (r): 8
        LoRA Alpha: 16
        LoRA Dropout: 0.5
        Number of Epochs: 30
        Batch Size: 2 (per device)
        Learning Rate: 2e-5
        Evaluation Strategy: Evaluated at each epoch
        Optimizer: AdamW
        Mixed Precision: FP16
    Hardware Used: [Specify hardware if known, e.g., "Single NVIDIA A100 GPU"]
    Libraries:
        transformers
        datasets
        peft
        bitsandbytes
        trl
        evaluate

## Intended Use

The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems.

## Limitations and Biases

Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created.
Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources.
Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts.


## Acknowledgements

Base Model: <a href="https://huggingface.co/meta-llama/Llama-3.2-1B">Meta AI's Llama-3.2-1B </a>
Dataset: <a href="https://huggingface.co/datasets/bergr7f/databricks-dolly-15k-subset-general_qa">Databricks Dolly 15k Subset for General QA</a>
Libraries Used:
  <li>Transformers</li>
  <li>PEFT</li>
  <li>TRL</li>
  <li>BitsAndBytes</li>


## How to Use
```python
  from transformers import AutoModelForCausalLM, AutoTokenizer
  from peft import PeftModel, PeftConfig
  
  peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit"
  config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')
  
  model = AutoModelForCausalLM.from_pretrained(
      config.base_model_name_or_path,
      device_map='auto',
      return_dict=True
  )
  
  tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
  tokenizer.pad_token = tokenizer.eos_token
  
  peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
```

## Inference the model
```python
  def create_chat_template(question, context):
      text = f"""
          [Instruction] You are a question-answering agent which answers the question based on the related reviews. 
          If related reviews are not provided, you can generate the answer based on the question.\n
          [Question] {question}\n
          [Related Reviews] {context}\n
          [Answer]
      """
      return text
  
  def generate_response(question, context):
      text = create_chat_template(question, context)
      inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
      
      config = GenerationConfig(
          max_length=256,
          temperature=0.5,
          top_k=5,
          top_p=0.95,
          repetition_penalty=1.2,
          do_sample=True,
          penalty_alpha=0.6
      )
      
      response = model.generate(**inputs, generation_config=config)
      output = tokenizer.decode(response[0], skip_special_tokens=True)
      return output
  
  # Example usage
  question = "Explain the process of photosynthesis."
  response = generate_response(question)
  print(response)
```