File size: 1,913 Bytes
60eff98
 
 
 
 
 
 
 
 
 
a7337c4
e6239ca
 
fc46b6c
 
e6239ca
 
fc46b6c
e6239ca
 
a7337c4
eb89191
a7337c4
 
 
 
 
e6239ca
384ab79
e6239ca
384ab79
e6239ca
 
 
fc46b6c
 
eb89191
 
 
fc46b6c
eb89191
fc46b6c
 
a7337c4
 
fc46b6c
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: mit
datasets:
- mlabonne/guanaco-llama2-1k
language:
- en
metrics:
- bleu
tags:
- text-generation-inference
pipeline_tag: text-generation
---

# Deployed Model
AjayMukundS/Llama-2-7b-chat-finetune

## Model Description
This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
In the case of Llama 2, the following Chat Template is used for the chat models: 

**(s)[INST] ((sys))**

**SYSTEM PROMPT**

**((/sys))**

**User Prompt [/INST] Model Answer (/s)**

System Prompt (optional) --> to guide the model

User prompt (required) --> to give the instruction / User Query

Model Answer (required)

## Training Data
The Instruction Dataset is reformated to follow the above Llama 2 template.

**Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\

**Reformated Dataset with 1K Samples** -->  https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k

**Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2

To know how this dataset was created, you can check this notebook --> https://colab.research.google.com/drive/1Ad7a9zMmkxuXTOh1Z7-rNSICA4dybpM2?usp=sharing

To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**

## Process
1) Load the dataset as defined.
2) Configure bitsandbytes for 4-bit quantization.
3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
4) Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
5) Fine Tuning Starts...