Spaces:

Artples
/

LBook

Running

App Files Files Community

Artples commited on Apr 2, 2024

Commit

f99d488

verified ·

1 Parent(s): ee2b52a

Upload Finetuning_NoteBook(2).ipynb

Browse files

Files changed (1) hide show

Finetuning_NoteBook(2).ipynb +624 -0

Finetuning_NoteBook(2).ipynb ADDED Viewed

	@@ -0,0 +1,624 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "25db6bb6",
+   "metadata": {},
+   "source": [
+    "# Installing Required Libraries!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0378e956",
+   "metadata": {},
+   "source": [
+    "Installing required libraries, including trl, transformers, accelerate, peft, datasets, and bitsandbytes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bfdba870",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Checks if PyTorch is installed and installs it if not.\n",
+    "try:\n",
+    "    import torch\n",
+    "    print(\"PyTorch is installed!\")\n",
+    "except ImportError:\n",
+    "    print(\"PyTorch is not installed.\")\n",
+    "    !pip install -q torch\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "538a911b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "!pip install -q --upgrade \"transformers==4.38.2\"\n",
+    "!pip install -q --upgrade \"datasets==2.16.1\"\n",
+    "!pip install -q --upgrade \"accelerate==0.26.1\"\n",
+    "!pip install -q --upgrade \"evaluate==0.4.1\"\n",
+    "!pip install -q --upgrade \"bitsandbytes==0.42.0\"\n",
+    "!pip install -q --upgrade \"trl==0.7.11\"\n",
+    "!pip install -q --upgrade \"peft==0.8.2\"\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb6eeaf2",
+   "metadata": {},
+   "source": [
+    "## Installing Flash Attention"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdd64478",
+   "metadata": {},
+   "source": [
+    "Installing Flash Attention to reduce the memory and runtime cost of the attention layer, and improve the performance of the model training. Learn more at [FlashAttention](https://github.com/Dao-AILab/flash-attention/tree/main). Installing flash attention from source can take quite a bit of time (~ minutes)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9d59ace4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "import torch; assert torch.cuda.get_device_capability()[0] >= 8, 'Hardware not supported for Flash Attention'\n",
+    "\n",
+    "!pip install ninja packaging\n",
+    "!MAX_JOBS=4 pip install -q flash-attn --no-build-isolation --upgrade\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9c1ff52",
+   "metadata": {},
+   "source": [
+    "# Load and Prepare the Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "100e0966",
+   "metadata": {},
+   "source": [
+    "The dataset is already formatted in a conversational format, which is supported by [trl](https://huggingface.co/docs/trl/index/), and ready for supervised finetuning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca04a539",
+   "metadata": {},
+   "source": [
+    "\n",
+    "**Conversational format:**\n",
+    "\n",
+    "\n",
+    "```python {\"messages\": [{\"role\": \"system\", \"content\": \"You are...\"}, {\"role\": \"user\", \"content\": \"...\"}, {\"role\": \"assistant\", \"content\": \"...\"}]}\n",
+    "{\"messages\": [{\"role\": \"system\", \"content\": \"You are...\"}, {\"role\": \"user\", \"content\": \"...\"}, {\"role\": \"assistant\", \"content\": \"...\"}]}\n",
+    "{\"messages\": [{\"role\": \"system\", \"content\": \"You are...\"}, {\"role\": \"user\", \"content\": \"...\"}, {\"role\": \"assistant\", \"content\": \"...\"}]}\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ec40616b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from datasets import load_dataset\n",
+    "    \n",
+    "# Load dataset from the hub\n",
+    "dataset = load_dataset(\"HuggingFaceH4/ultrachat_200k\", split=\"train_sft\")\n",
+    "    \n",
+    "dataset = dataset.shuffle(seed=42)\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "805c2975",
+   "metadata": {},
+   "source": [
+    "# Load **mistralai/Mistral-7B-v0.1** for Finetuning"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8248708e",
+   "metadata": {},
+   "source": [
+    "\n",
+    "This process involves two key steps:\n",
+    "\n",
+    "1. **LLM Quantization:**\n",
+    "    - We first load the selected large language model (LLM).\n",
+    "    - We then use the `bitsandbytes` library to quantize the model, which can significantly reduce its memory footprint.\n",
+    "\n",
+    "> **Note:** The memory requirements of the model scale with its size. For instance, a 7B parameter model may require \n",
+    "a 24GB GPU for fine-tuning. \n",
+    "\n",
+    "2. **Chat Model Preparation:**\n",
+    "    - To train a model for chat/conversational tasks, we need to prepare both the model and its tokenizer.\n",
+    "    \n",
+    "    - This involves adding special tokens to the tokenizer and the model itself. These tokens help the model \n",
+    "    understand the different roles within a conversation. \n",
+    "    \n",
+    "    - The **trl** provides a convenient method called `setup_chat_format` for this purpose. This method performs the \n",
+    "    following actions: \n",
+    "    \n",
+    "        * Adds special tokens to the tokenizer, such as `<|im_start|>` and `<|im_end|>`, to mark the beginning and \n",
+    "        ending of a conversation. \n",
+    "        \n",
+    "        * Resizes the model's embedding layer to accommodate the new tokens.\n",
+    "        \n",
+    "        * Sets the tokenizer's chat template, which defines the format used to convert input data into a chat-like \n",
+    "        structure. The default template is `chatml` from OpenAI.\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5612b641",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig\n",
+    "from trl import setup_chat_format\n",
+    "\n",
+    "# Hugging Face model id\n",
+    "model_id = \"mistralai/Mistral-7B-v0.1\"\n",
+    "\n",
+    "# BitsAndBytesConfig\n",
+    "bnb_config = BitsAndBytesConfig(\n",
+    "    load_in_8bit=True, bnb_4bit_use_double_quant=True, \n",
+    "    bnb_4bit_quant_type=\"nf4\", bnb_4bit_compute_dtype=torch.bfloat16 \n",
+    ")\n",
+    "\n",
+    "# Load model and tokenizer\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    model_id,\n",
+    "    device_map=\"auto\",\n",
+    "    trust_remote_code=True,\n",
+    "    attn_implementation='flash_attention_2',\n",
+    "    torch_dtype=torch.bfloat16,\n",
+    "    quantization_config=bnb_config\n",
+    ")\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\"mistralai/Mistral-7B-v0.1\")\n",
+    "tokenizer.padding_side = \"left\"\n",
+    "\n",
+    "\n",
+    "# Set chat template to OAI chatML\n",
+    "model, tokenizer = setup_chat_format(model, tokenizer)\n",
+    "\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25713c3a",
+   "metadata": {},
+   "source": [
+    "## Setting LoRA Config"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a990077",
+   "metadata": {},
+   "source": [
+    "The `SFTTrainer` provides native integration with `peft`, simplifying the process of efficiently tuning \n",
+    "    Language Models (LLMs) using techniques such as [LoRA](\n",
+    "    https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms). The only requirement is to create \n",
+    "    the `LoraConfig` and pass it to the `SFTTrainer`. \n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3aef033e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from peft import LoraConfig\n",
+    "\n",
+    "peft_config = LoraConfig(\n",
+    "    lora_alpha=8,\n",
+    "    lora_dropout=0.05,\n",
+    "    r=6,\n",
+    "    bias=\"none\",\n",
+    "    target_modules=\"all-linear\",\n",
+    "    task_type=\"CAUSAL_LM\"\n",
+    ")\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78dc9315",
+   "metadata": {},
+   "source": [
+    "## Setting the TrainingArguments"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "02e9452a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Installing tensorboard to report the metrics\n",
+    "!pip install -q tensorboard\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4cb748d1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from transformers import TrainingArguments\n",
+    "\n",
+    "args = TrainingArguments(\n",
+    "    output_dir=\"temp_/tmp/model\",\n",
+    "    num_train_epochs=100,\n",
+    "    per_device_train_batch_size=3,\n",
+    "    gradient_accumulation_steps=2,\n",
+    "    gradient_checkpointing=True,\n",
+    "    gradient_checkpointing_kwargs={'use_reentrant': False},\n",
+    "    optim=\"adamw_torch_fused\",\n",
+    "    logging_steps=10,\n",
+    "    save_strategy='epoch',\n",
+    "    learning_rate=0.075,\n",
+    "    bf16=True,\n",
+    "    max_grad_norm=0.3,\n",
+    "    warmup_ratio=0.1,\n",
+    "    lr_scheduler_type='cosine',\n",
+    "    report_to='tensorboard', \n",
+    "    max_steps=-1,\n",
+    "    seed=42,\n",
+    "    overwrite_output_dir=True,\n",
+    "    remove_unused_columns=True\n",
+    ")\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afad0f24",
+   "metadata": {},
+   "source": [
+    "## Setting the Supervised Finetuning Trainer (`SFTTrainer`)\n",
+    "    \n",
+    "This `SFTTrainer` is a wrapper around the `transformers.Trainer` class and inherits all of its attributes and methods.\n",
+    "The trainer takes care of properly initializing the `PeftModel`.   \n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4786995f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from trl import SFTTrainer\n",
+    "\n",
+    "trainer = SFTTrainer(\n",
+    "    model=model,\n",
+    "    args=args,\n",
+    "    train_dataset=dataset,\n",
+    "    peft_config=peft_config,\n",
+    "    max_seq_length=2048,\n",
+    "    tokenizer=tokenizer,\n",
+    "    packing=True,\n",
+    "    dataset_kwargs={'add_special_tokens': False, 'append_concat_token': False}\n",
+    ")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a32f64b",
+   "metadata": {},
+   "source": [
+    "### Starting Training and Saving Model/Tokenizer\n",
+    "\n",
+    "We start training the model by calling the `train()` method on the trainer instance. This will start the training \n",
+    "loop and train the model for `100 epochs`. The model will be automatically saved to the output directory (**'temp_/tmp/model'**)\n",
+    "and to the hub in **'User//tmp/model'**. \n",
+    "  \n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1a722966",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "model.config.use_cache = False\n",
+    "\n",
+    "# start training\n",
+    "trainer.train()\n",
+    "\n",
+    "# save the peft model\n",
+    "trainer.save_model()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d72635c",
+   "metadata": {},
+   "source": [
+    "### Free the GPU Memory to Prepare Merging `LoRA` Adapters with the Base Model\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "131b1b16",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "# Free the GPU memory\n",
+    "del model\n",
+    "del trainer\n",
+    "torch.cuda.empty_cache()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ea238ed",
+   "metadata": {},
+   "source": [
+    "## Merging LoRA Adapters into the Original Model\n",
+    "\n",
+    "While utilizing `LoRA`, we focus on training the adapters rather than the entire model. Consequently, during the \n",
+    "model saving process, only the `adapter weights` are preserved, not the complete model. If we wish to save the \n",
+    "entire model for easier usage with Text Generation Inference, we can incorporate the adapter weights into the model \n",
+    "weights. This can be achieved using the `merge_and_unload` method. Following this, the model can be saved using the \n",
+    "`save_pretrained` method. The result is a default model that is ready for inference.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0f1dc2a9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "import torch\n",
+    "from peft import AutoPeftModelForCausalLM\n",
+    "\n",
+    "# Load Peft model on CPU\n",
+    "model = AutoPeftModelForCausalLM.from_pretrained(\n",
+    "    \"temp_/tmp/model\",\n",
+    "    torch_dtype=torch.float16,\n",
+    "    low_cpu_mem_usage=True\n",
+    ")\n",
+    "    \n",
+    "# Merge LoRA with the base model and save\n",
+    "merged_model = model.merge_and_unload()\n",
+    "merged_model.save_pretrained(\"/tmp/model\", safe_serialization=True, max_shard_size=\"2GB\")\n",
+    "tokenizer.save_pretrained(\"/tmp/model\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41cfdd2c",
+   "metadata": {},
+   "source": [
+    "### Copy all result folders from 'temp_/tmp/model' to '/tmp/model'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a115b861",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "import os\n",
+    "import shutil\n",
+    "\n",
+    "source_folder = \"temp_/tmp/model\"\n",
+    "destination_folder = \"/tmp/model\"\n",
+    "os.makedirs(destination_folder, exist_ok=True)\n",
+    "for item in os.listdir(source_folder):\n",
+    "    item_path = os.path.join(source_folder, item)\n",
+    "    if os.path.isdir(item_path):\n",
+    "        destination_path = os.path.join(destination_folder, item)\n",
+    "        shutil.copytree(item_path, destination_path)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "427b8a54",
+   "metadata": {},
+   "source": [
+    "### Generating a model card (README.md)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bb89c11b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "card = '''\n",
+    "---\n",
+    "license: apache-2.0\n",
+    "tags:\n",
+    "- generated_from_trainer\n",
+    "- mistralai/Mistral\n",
+    "- PyTorch\n",
+    "- transformers\n",
+    "- trl\n",
+    "- peft\n",
+    "- tensorboard\n",
+    "base_model: mistralai/Mistral-7B-v0.1\n",
+    "widget:\n",
+    "  - example_title: Pirate!\n",
+    "    messages:\n",
+    "      - role: system\n",
+    "        content: You are a pirate chatbot who always responds with Arr!\n",
+    "      - role: user\n",
+    "        content: \"There's a llama on my lawn, how can I get rid of him?\"\n",
+    "    output:\n",
+    "      text: >-\n",
+    "        Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare\n",
+    "        sight, but I've got a plan that might help ye get rid of 'im. Ye'll need\n",
+    "        to gather some carrots and hay, and then lure the llama away with the\n",
+    "        promise of a tasty treat. Once he's gone, ye can clean up yer lawn and\n",
+    "        enjoy the peace and quiet once again. But beware, me hearty, for there\n",
+    "        may be more llamas where that one came from! Arr!\n",
+    "model-index:\n",
+    "- name: /tmp/model\n",
+    "  results: []\n",
+    "datasets:\n",
+    "- HuggingFaceH4/ultrachat_200k\n",
+    "language:\n",
+    "- en\n",
+    "pipeline_tag: text-generation\n",
+    "---\n",
+    "\n",
+    "# Model Card for /tmp/model:\n",
+    "\n",
+    "**/tmp/model** is a language model that is trained to act as helpful assistant. It is a finetuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained using `SFTTrainer` on publicly available dataset [\n",
+    "HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k).\n",
+    "\n",
+    "## Training Procedure:\n",
+    "\n",
+    "The training code used to create this model was generated by [Menouar/LLM-FineTuning-Notebook-Generator](https://huggingface.co/spaces/Menouar/LLM-FineTuning-Notebook-Generator).\n",
+    "\n",
+    "\n",
+    "\n",
+    "## Training hyperparameters\n",
+    "\n",
+    "The following hyperparameters were used during the training:\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "\n",
+    "with open(\"/tmp/model/README.md\", \"w\") as f:\n",
+    "    f.write(card)\n",
+    "\n",
+    "args_dict = vars(args)\n",
+    "\n",
+    "with open(\"/tmp/model/README.md\", \"a\") as f:\n",
+    "    for k, v in args_dict.items():\n",
+    "        f.write(f\"- {k}: {v}\")\n",
+    "        f.write(\"\\n \\n\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12c5ab30",
+   "metadata": {},
+   "source": [
+    "## Login to HF"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10117bb9",
+   "metadata": {},
+   "source": [
+    "Replace `HF_TOKEN` with a valid token in order to push **'/tmp/model'** to `huggingface_hub`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e0697a8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Install huggingface_hub\n",
+    "!pip install -q huggingface_hub\n",
+    "    \n",
+    "from huggingface_hub import login\n",
+    "    \n",
+    "login(\n",
+    "        token='HF_TOKEN',\n",
+    "        add_to_git_credential=True\n",
+    ")\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f176ddac",
+   "metadata": {},
+   "source": [
+    "## Pushing '/tmp/model' to the Hugging Face account."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7a6b3c9f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from huggingface_hub import HfApi, HfFolder, Repository\n",
+    "\n",
+    "# Instantiate the HfApi class\n",
+    "api = HfApi()\n",
+    "\n",
+    "# Our Hugging Face repository\n",
+    "repo_name = \"/tmp/model\"\n",
+    "\n",
+    "# Create a repository on the Hugging Face Hub\n",
+    "repo = api.create_repo(token=HfFolder.get_token(), repo_type=\"model\", repo_id=repo_name)\n",
+    "\n",
+    "api.upload_folder(\n",
+    "    folder_path=\"/tmp/model\",\n",
+    "    repo_id=repo.repo_id\n",
+    ")\n"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}