Spaces:

JayosChaos
/

Fine_tuning_my_first_model_or_agent

Running

File size: 19,032 Bytes

00e4fdb

# -*- coding: utf-8 -*-
"""bonus-unit1.ipynb

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/#fileId=https%3A//huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb

# Bonus Unit 1: Fine-Tuning a model for Function-Calling

In this tutorial, **we're going to Fine-Tune an LLM for Function Calling.**

This notebook is part of the <a href="https://www.hf.co/learn/agents-course/unit1/introduction">Hugging Face Agents Course</a>, a free Course from beginner to expert, where you learn to build Agents.

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png" alt="Agent Course"/>

## Prerequisites 🏗️

Before diving into the notebook, you need to:

🔲 📚 **Study [What is Function-Calling](https://www.hf.co/learn/agents-course/bonus-unit1/what-is-function-calling) Section**

🔲 📚 **Study [Fine-Tune your Model and what are LoRAs](https://www.hf.co/learn/agents-course/bonus-unit1/fine-tuning) Section**

# Step 0: Ask to Access Gemma on Hugging Face

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/gemma.png" alt="Gemma"/>


To access Gemma on Hugging Face:

1. **Make sure you're signed in** to your Hugging Face Account

2. Go to https://huggingface.co/google/gemma-2-2b-it

3. Click on **Acknowledge license** and fill the form.

Alternatively you can use another model, and modify the code accordingly (it can be a good exercise for you to be sure you know how to fine-tune for Function-Calling).

You can use for instance:

- [HuggingFaceTB/SmolLM2-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)

- [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)

## Step 1: Set the GPU 💪

If you're on Colab:

- To **accelerate the fine-tuning training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg" alt="GPU Step 1"/>

- `Hardware Accelerator > GPU`

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg" alt="GPU Step 2"/>


### Important

For this Unit, **with the free-tier of Colab** it will take around **6h to train**.

You have three solutions if you want to make it faster:

1. Train on your computer if you have GPUs. It might take time but you have less risks of timeout.

2. Use a Google Colab Pro that allows you use to A100 GPU (15-20min training).

3. Just follow the code to learn how to do it without training.

## Step 2: Install dependencies 📚

We need multiple librairies:

- `bitsandbytes` for quantization
- `peft`for LoRA adapters
- `Transformers`for loading the model
- `datasets`for loading and using the fine-tuning dataset
- `trl`for the trainer class
"""

!pip install -q -U bitsandbytes
!pip install -q -U peft
!pip install -q -U trl
!pip install -q -U tensorboardX
!pip install -q wandb

"""## Step 3: Create your Hugging Face Token to push your model to the Hub

To be able to share your model with the community there are some more steps to follow:

1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join

2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.

- Create a new token (https://huggingface.co/settings/tokens) **with write role**

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/create_write_token.png" alt="Create HF Token" width="50%">

3️⃣ Store your token as an environment variable under the name "HF_TOKEN"
- **Be very carefull not to share it with others** !

## Step 4: Import the librairies

Don't forget to put your HF token.
"""

from enum import Enum
from functools import partial
import pandas as pd
import torch
import json

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig, TaskType

seed = 42
set_seed(seed)

import os

# Put your HF Token here
os.environ['HF_TOKEN']="hf_xxxxxxx" # the token should have write access

"""## Step 5: Processing the dataset into inputs

In order to train the model, we need to **format the inputs into what we want the model to learn**.

For this tutorial, I enhanced a popular dataset for function calling "NousResearch/hermes-function-calling-v1" by adding some new **thinking** step computer from **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B**.

But in order for the model to learn, we need **to format the conversation correctly**. If you followed Unit 1, you know that going from a list of messages to a prompt is handled by the **chat_template**, or, the default chat_template of gemma-2-2B does not contain tool calls. So we will need to modify it !

This is the role of our **preprocess** function. To go from a list of messages, to a prompt that the model can understand.

"""

model_name = "google/gemma-2-2b-it"
dataset_name = "Jofthomas/hermes-function-calling-thinking-V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"


def preprocess(sample):
      messages = sample["messages"]
      first_message = messages[0]

      # Instead of adding a system message, we merge the content into the first user message
      if first_message["role"] == "system":
          system_message_content = first_message["content"]
          # Merge system content with the first user message
          messages[1]["content"] = system_message_content + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n\n" + messages[1]["content"]
          # Remove the system message from the conversation
          messages.pop(0)

      return {"text": tokenizer.apply_chat_template(messages, tokenize=False)}



dataset = load_dataset(dataset_name)
dataset = dataset.rename_column("conversations", "messages")

"""## Step 6: A Dedicated Dataset for This Unit

For this Bonus Unit, we created a custom dataset based on [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1), which is considered a **reference** when it comes to function-calling datasets.

While the original dataset is excellent, it does **not** include a **“thinking”** step.

In Function-Calling, such a step is optional, but recent work—like the **deepseek** model or the paper ["Test-Time Compute"](https://huggingface.co/papers/2408.03314)—suggests that giving an LLM time to “think” before it answers (or in this case, **before** taking an action) can **significantly improve** model performance.

I, decided to then compute a subset of this dataset and to give it to [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) in order to compute some thinking tokens `<think>` before any function call. Which resulted in the following dataset :
![Input Dataset](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/dataset_function_call.png)

"""

dataset = dataset.map(preprocess, remove_columns="messages")
dataset = dataset["train"].train_test_split(0.1)
print(dataset)

"""## Step 7: Checking the inputs

Let's manually look at what an input looks like !

In this example we have :

1. A *User message* containing the **necessary information with the list of available tools** inbetween `<tools></tools>` then the user query, here:  `"Can you get me the latest news headlines for the United States?"`

2. An *Assistant message* here called "model" to fit the criterias from gemma models containing two new phases, a **"thinking"** phase contained in `<think></think>` and an **"Act"** phase contained in `<tool_call></<tool_call>`.

3. If the model contains a `<tools_call>`, we will append the result of this action in a new **"Tool"** message containing a `<tool_response></tool_response>` with the answer from the tool.
"""

# Let's look at how we formatted the dataset
print(dataset["train"][8]["text"])

# Sanity check
print(tokenizer.pad_token)
print(tokenizer.eos_token)

"""## Step 8: Let's Modify the Tokenizer

Indeed, as we saw in Unit 1, the tokenizer splits text into sub-words by default. This is **not** what we want for our new special tokens!

While we segmented our example using `<think>`, `<tool_call>`, and `<tool_response>`, the tokenizer does **not** yet treat them as whole tokens—it still tries to break them down into smaller pieces. To ensure the model correctly interprets our new format, we must **add these tokens** to our tokenizer.

Additionally, since we changed the `chat_template` in our **preprocess** function to format conversations as messages within a prompt, we also need to modify the `chat_template` in the tokenizer to reflect these changes.
"""

class ChatmlSpecialTokens(str, Enum):
    tools = "<tools>"
    eotools = "</tools>"
    think = "<think>"
    eothink = "</think>"
    tool_call="<tool_call>"
    eotool_call="</tool_call>"
    tool_response="<tool_reponse>"
    eotool_response="</tool_reponse>"
    pad_token = "<pad>"
    eos_token = "<eos>"
    @classmethod
    def list(cls):
        return [c.value for c in cls]

tokenizer = AutoTokenizer.from_pretrained(
        model_name,
        pad_token=ChatmlSpecialTokens.pad_token.value,
        additional_special_tokens=ChatmlSpecialTokens.list()
    )
tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             attn_implementation='eager',
                                             device_map="auto")
model.resize_token_embeddings(len(tokenizer))
model.to(torch.bfloat16)

"""## Step 9: Let's configure the LoRA

This is we are going to define the parameter of our adapter. Those a the most important parameters in LoRA as they define the size and importance of the adapters we are training.
"""

from peft import LoraConfig

# TODO: Configure LoRA parameters
# r: rank dimension for LoRA update matrices (smaller = more compression)
rank_dimension = 16
# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
lora_alpha = 64
# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
lora_dropout = 0.05

peft_config = LoraConfig(r=rank_dimension,
                         lora_alpha=lora_alpha,
                         lora_dropout=lora_dropout,
                         target_modules=["gate_proj","q_proj","lm_head","o_proj","k_proj","embed_tokens","down_proj","up_proj","v_proj"], # wich layer in the transformers do we target ?
                         task_type=TaskType.CAUSAL_LM)

"""## Step 10: Let's define the Trainer and the Fine-Tuning hyperparameters

In this step, we define the Trainer, the class that we use to fine-tune our model and the hyperparameters.
"""

username="Jofthomas"# REPLCAE with your Hugging Face username
output_dir = "gemma-2-2B-it-thinking-function_calling-V0" # The directory where the trained model checkpoints, logs, and other artifacts will be saved. It will also be the default name of the model when pushed to the hub if not redefined later.
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 4
logging_steps = 5
learning_rate = 1e-4 # The initial learning rate for the optimizer.

max_grad_norm = 1.0
num_train_epochs=1
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
max_seq_length = 1500

training_arguments = SFTConfig(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    per_device_eval_batch_size=per_device_eval_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    save_strategy="no",
    eval_strategy="epoch",
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    max_grad_norm=max_grad_norm,
    weight_decay=0.1,
    warmup_ratio=warmup_ratio,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard",
    bf16=True,
    hub_private_repo=False,
    push_to_hub=False,
    num_train_epochs=num_train_epochs,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant": False},
    packing=True,
    max_seq_length=max_seq_length,
)

"""As Trainer, we use the `SFTTrainer` which is a Supervised Fine-Tuning Trainer."""

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    processing_class=tokenizer,
    peft_config=peft_config,
)

"""Here, we launch the training 🔥. Perfect time for you to pause and grab a coffee ☕."""

trainer.train()
trainer.save_model()

"""## Step 11: Let's push the Model and the Tokenizer to the Hub

Let's push our model and out tokenizer to the Hub ! The model will be pushed under your username + the output_dir that we specified earlier.
"""

trainer.push_to_hub(f"{username}/{output_dir}")

"""Since we also modified the **chat_template** Which is contained in the tokenizer, let's also push the tokenizer with the model."""

tokenizer.eos_token = "<eos>"
# push the tokenizer to hub ( replace with your username and your previously specified
tokenizer.push_to_hub(f"{username}/{output_dir}", token=True)

"""## Step 12: Let's now test our model !

To so, we will :

1. Load the adapter from the hub !
2. Load the base model : **"google/gemma-2-2b-it"** from the hub
3. Resize the model to with the new tokens we introduced !
"""

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from datasets import load_dataset
import torch

bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_use_double_quant=True,
        )

peft_model_id = f"{username}/{output_dir}" # replace with your newly trained adapter
device = "auto"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
                                             device_map="auto",
                                             )
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, peft_model_id)
model.to(torch.bfloat16)
model.eval()

print(dataset["test"][8]["text"])

"""### Testing the model 🚀

In that case, we will take the start of one of the samples from the test set and hope that it will generate the expected output.

Since we want to test the function-calling capacities of our newly fine-tuned model, the input will be a user message with the available tools, a


### Disclaimer ⚠️

The dataset we’re using **does not contain sufficient training data** and is purely for **educational purposes**. As a result, **your trained model’s outputs may differ** from the examples shown in this course. **Don’t be discouraged** if your results vary—our primary goal here is to illustrate the core concepts rather than produce a fully optimized or production-ready model.

"""

#this prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.
prompt="""<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>
<start_of_turn>model
<think>"""

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
inputs = {k: v.to("cuda") for k,v in inputs.items()}
outputs = model.generate(**inputs,
                         max_new_tokens=300,# Adapt as necessary
                         do_sample=True,
                         top_p=0.95,
                         temperature=0.01,
                         repetition_penalty=1.0,
                         eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))

"""## Congratulations
Congratulations on finishing this first Bonus Unit 🥳

You've just **mastered what Function-Calling is and how to fine-tune your model to do Function-Calling**!

If it's the first time you do this, it's normal that you're feeling puzzled. Take time to check the documentation and understand each part of the code and why we did it this way.

Also, don't hesitate to try to **fine-tune different models**. The **best way to learn is by trying.**

### Keep Learning, Stay Awesome 🤗
"""