AjayMukundS's picture
Update README.md
cddb4a4 verified
metadata
license: mit
datasets:
  - mlabonne/guanaco-llama2-1k
language:
  - en
metrics:
  - bleu
tags:
  - text-generation-inference
pipeline_tag: text-generation

Deployed Model

AjayMukundS/Llama-2-7b-chat-finetune

Model Description

This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from mlabonne/guanaco-llama2. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion. In the case of Llama 2, the following Chat Template is used for the chat models:

(s)[INST] ((sys))

SYSTEM PROMPT

((/sys))

User Prompt [/INST] Model Answer (/s)

System Prompt (optional) --> to guide the model

User prompt (required) --> to give the instruction / User Query

Model Answer (required)

Training Data

The Instruction Dataset is reformated to follow the above Llama 2 template.

Original Dataset --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\

Reformated Dataset with 1K Samples --> https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k

Complete Reformated Datset --> https://huggingface.co/datasets/mlabonne/guanaco-llama2

To know how this dataset was created, you can check this notebook --> https://colab.research.google.com/drive/1Ad7a9zMmkxuXTOh1Z7-rNSICA4dybpM2?usp=sharing

To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was L4 (Google Colab Pro)

Process

  1. Load the dataset as defined.
  2. Configure bitsandbytes for 4-bit quantization.
  3. Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
  4. Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
  5. Fine Tuning Starts...