File size: 19,032 Bytes
00e4fdb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
# -*- coding: utf-8 -*-
"""bonus-unit1.ipynb

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/#fileId=https%3A//huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb

# Bonus Unit 1: Fine-Tuning a model for Function-Calling

In this tutorial, **we're going to Fine-Tune an LLM for Function Calling.**

This notebook is part of the <a href="https://www.hf.co/learn/agents-course/unit1/introduction">Hugging Face Agents Course</a>, a free Course from beginner to expert, where you learn to build Agents.

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png" alt="Agent Course"/>

## Prerequisites 🏗️

Before diving into the notebook, you need to:

🔲 📚 **Study [What is Function-Calling](https://www.hf.co/learn/agents-course/bonus-unit1/what-is-function-calling) Section**

🔲 📚 **Study [Fine-Tune your Model and what are LoRAs](https://www.hf.co/learn/agents-course/bonus-unit1/fine-tuning) Section**

# Step 0: Ask to Access Gemma on Hugging Face

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/gemma.png" alt="Gemma"/>


To access Gemma on Hugging Face:

1. **Make sure you're signed in** to your Hugging Face Account

2. Go to https://huggingface.co/google/gemma-2-2b-it

3. Click on **Acknowledge license** and fill the form.

Alternatively you can use another model, and modify the code accordingly (it can be a good exercise for you to be sure you know how to fine-tune for Function-Calling).

You can use for instance:

- [HuggingFaceTB/SmolLM2-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)

- [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)

## Step 1: Set the GPU 💪

If you're on Colab:

- To **accelerate the fine-tuning training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg" alt="GPU Step 1"/>

- `Hardware Accelerator > GPU`

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg" alt="GPU Step 2"/>


### Important

For this Unit, **with the free-tier of Colab** it will take around **6h to train**.

You have three solutions if you want to make it faster:

1. Train on your computer if you have GPUs. It might take time but you have less risks of timeout.

2. Use a Google Colab Pro that allows you use to A100 GPU (15-20min training).

3. Just follow the code to learn how to do it without training.

## Step 2: Install dependencies 📚

We need multiple librairies:

- `bitsandbytes` for quantization
- `peft`for LoRA adapters
- `Transformers`for loading the model
- `datasets`for loading and using the fine-tuning dataset
- `trl`for the trainer class
"""

!pip install -q -U bitsandbytes
!pip install -q -U peft
!pip install -q -U trl
!pip install -q -U tensorboardX
!pip install -q wandb

"""## Step 3: Create your Hugging Face Token to push your model to the Hub

To be able to share your model with the community there are some more steps to follow:

1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join

2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.

- Create a new token (https://huggingface.co/settings/tokens) **with write role**

<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/create_write_token.png" alt="Create HF Token" width="50%">

3️⃣ Store your token as an environment variable under the name "HF_TOKEN"
- **Be very carefull not to share it with others** !

## Step 4: Import the librairies

Don't forget to put your HF token.
"""

from enum import Enum
from functools import partial
import pandas as pd
import torch
import json

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig, TaskType

seed = 42
set_seed(seed)

import os

# Put your HF Token here
os.environ['HF_TOKEN']="hf_xxxxxxx" # the token should have write access

"""## Step 5: Processing the dataset into inputs

In order to train the model, we need to **format the inputs into what we want the model to learn**.

For this tutorial, I enhanced a popular dataset for function calling "NousResearch/hermes-function-calling-v1" by adding some new **thinking** step computer from **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B**.

But in order for the model to learn, we need **to format the conversation correctly**. If you followed Unit 1, you know that going from a list of messages to a prompt is handled by the **chat_template**, or, the default chat_template of gemma-2-2B does not contain tool calls. So we will need to modify it !

This is the role of our **preprocess** function. To go from a list of messages, to a prompt that the model can understand.

"""

model_name = "google/gemma-2-2b-it"
dataset_name = "Jofthomas/hermes-function-calling-thinking-V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"


def preprocess(sample):
      messages = sample["messages"]
      first_message = messages[0]

      # Instead of adding a system message, we merge the content into the first user message
      if first_message["role"] == "system":
          system_message_content = first_message["content"]
          # Merge system content with the first user message
          messages[1]["content"] = system_message_content + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n\n" + messages[1]["content"]
          # Remove the system message from the conversation
          messages.pop(0)

      return {"text": tokenizer.apply_chat_template(messages, tokenize=False)}



dataset = load_dataset(dataset_name)
dataset = dataset.rename_column("conversations", "messages")

"""## Step 6: A Dedicated Dataset for This Unit

For this Bonus Unit, we created a custom dataset based on [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1), which is considered a **reference** when it comes to function-calling datasets.

While the original dataset is excellent, it does **not** include a **“thinking”** step.

In Function-Calling, such a step is optional, but recent work—like the **deepseek** model or the paper ["Test-Time Compute"](https://huggingface.co/papers/2408.03314)—suggests that giving an LLM time to “think” before it answers (or in this case, **before** taking an action) can **significantly improve** model performance.

I, decided to then compute a subset of this dataset and to give it to [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) in order to compute some thinking tokens `<think>` before any function call. Which resulted in the following dataset :
![Input Dataset](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/bonus-unit1/dataset_function_call.png)

"""

dataset = dataset.map(preprocess, remove_columns="messages")
dataset = dataset["train"].train_test_split(0.1)
print(dataset)

"""## Step 7: Checking the inputs

Let's manually look at what an input looks like !

In this example we have :

1. A *User message* containing the **necessary information with the list of available tools** inbetween `<tools></tools>` then the user query, here:  `"Can you get me the latest news headlines for the United States?"`

2. An *Assistant message* here called "model" to fit the criterias from gemma models containing two new phases, a **"thinking"** phase contained in `<think></think>` and an **"Act"** phase contained in `<tool_call></<tool_call>`.

3. If the model contains a `<tools_call>`, we will append the result of this action in a new **"Tool"** message containing a `<tool_response></tool_response>` with the answer from the tool.
"""

# Let's look at how we formatted the dataset
print(dataset["train"][8]["text"])

# Sanity check
print(tokenizer.pad_token)
print(tokenizer.eos_token)

"""## Step 8: Let's Modify the Tokenizer

Indeed, as we saw in Unit 1, the tokenizer splits text into sub-words by default. This is **not** what we want for our new special tokens!

While we segmented our example using `<think>`, `<tool_call>`, and `<tool_response>`, the tokenizer does **not** yet treat them as whole tokens—it still tries to break them down into smaller pieces. To ensure the model correctly interprets our new format, we must **add these tokens** to our tokenizer.

Additionally, since we changed the `chat_template` in our **preprocess** function to format conversations as messages within a prompt, we also need to modify the `chat_template` in the tokenizer to reflect these changes.
"""

class ChatmlSpecialTokens(str, Enum):
    tools = "<tools>"
    eotools = "</tools>"
    think = "<think>"
    eothink = "</think>"
    tool_call="<tool_call>"
    eotool_call="</tool_call>"
    tool_response="<tool_reponse>"
    eotool_response="</tool_reponse>"
    pad_token = "<pad>"
    eos_token = "<eos>"
    @classmethod
    def list(cls):
        return [c.value for c in cls]

tokenizer = AutoTokenizer.from_pretrained(
        model_name,
        pad_token=ChatmlSpecialTokens.pad_token.value,
        additional_special_tokens=ChatmlSpecialTokens.list()
    )
tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             attn_implementation='eager',
                                             device_map="auto")
model.resize_token_embeddings(len(tokenizer))
model.to(torch.bfloat16)

"""## Step 9: Let's configure the LoRA

This is we are going to define the parameter of our adapter. Those a the most important parameters in LoRA as they define the size and importance of the adapters we are training.
"""

from peft import LoraConfig

# TODO: Configure LoRA parameters
# r: rank dimension for LoRA update matrices (smaller = more compression)
rank_dimension = 16
# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
lora_alpha = 64
# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
lora_dropout = 0.05

peft_config = LoraConfig(r=rank_dimension,
                         lora_alpha=lora_alpha,
                         lora_dropout=lora_dropout,
                         target_modules=["gate_proj","q_proj","lm_head","o_proj","k_proj","embed_tokens","down_proj","up_proj","v_proj"], # wich layer in the transformers do we target ?
                         task_type=TaskType.CAUSAL_LM)

"""## Step 10: Let's define the Trainer and the Fine-Tuning hyperparameters

In this step, we define the Trainer, the class that we use to fine-tune our model and the hyperparameters.
"""

username="Jofthomas"# REPLCAE with your Hugging Face username
output_dir = "gemma-2-2B-it-thinking-function_calling-V0" # The directory where the trained model checkpoints, logs, and other artifacts will be saved. It will also be the default name of the model when pushed to the hub if not redefined later.
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 4
logging_steps = 5
learning_rate = 1e-4 # The initial learning rate for the optimizer.

max_grad_norm = 1.0
num_train_epochs=1
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
max_seq_length = 1500

training_arguments = SFTConfig(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    per_device_eval_batch_size=per_device_eval_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    save_strategy="no",
    eval_strategy="epoch",
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    max_grad_norm=max_grad_norm,
    weight_decay=0.1,
    warmup_ratio=warmup_ratio,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard",
    bf16=True,
    hub_private_repo=False,
    push_to_hub=False,
    num_train_epochs=num_train_epochs,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant": False},
    packing=True,
    max_seq_length=max_seq_length,
)

"""As Trainer, we use the `SFTTrainer` which is a Supervised Fine-Tuning Trainer."""

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    processing_class=tokenizer,
    peft_config=peft_config,
)

"""Here, we launch the training 🔥. Perfect time for you to pause and grab a coffee ☕."""

trainer.train()
trainer.save_model()

"""## Step 11: Let's push the Model and the Tokenizer to the Hub

Let's push our model and out tokenizer to the Hub ! The model will be pushed under your username + the output_dir that we specified earlier.
"""

trainer.push_to_hub(f"{username}/{output_dir}")

"""Since we also modified the **chat_template** Which is contained in the tokenizer, let's also push the tokenizer with the model."""

tokenizer.eos_token = "<eos>"
# push the tokenizer to hub ( replace with your username and your previously specified
tokenizer.push_to_hub(f"{username}/{output_dir}", token=True)

"""## Step 12: Let's now test our model !

To so, we will :

1. Load the adapter from the hub !
2. Load the base model : **"google/gemma-2-2b-it"** from the hub
3. Resize the model to with the new tokens we introduced !
"""

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from datasets import load_dataset
import torch

bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_use_double_quant=True,
        )

peft_model_id = f"{username}/{output_dir}" # replace with your newly trained adapter
device = "auto"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
                                             device_map="auto",
                                             )
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, peft_model_id)
model.to(torch.bfloat16)
model.eval()

print(dataset["test"][8]["text"])

"""### Testing the model 🚀

In that case, we will take the start of one of the samples from the test set and hope that it will generate the expected output.

Since we want to test the function-calling capacities of our newly fine-tuned model, the input will be a user message with the available tools, a


### Disclaimer ⚠️

The dataset we’re using **does not contain sufficient training data** and is purely for **educational purposes**. As a result, **your trained model’s outputs may differ** from the examples shown in this course. **Don’t be discouraged** if your results vary—our primary goal here is to illustrate the core concepts rather than produce a fully optimized or production-ready model.

"""

#this prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.
prompt="""<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>
<start_of_turn>model
<think>"""

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
inputs = {k: v.to("cuda") for k,v in inputs.items()}
outputs = model.generate(**inputs,
                         max_new_tokens=300,# Adapt as necessary
                         do_sample=True,
                         top_p=0.95,
                         temperature=0.01,
                         repetition_penalty=1.0,
                         eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))

"""## Congratulations
Congratulations on finishing this first Bonus Unit 🥳

You've just **mastered what Function-Calling is and how to fine-tune your model to do Function-Calling**!

If it's the first time you do this, it's normal that you're feeling puzzled. Take time to check the documentation and understand each part of the code and why we did it this way.

Also, don't hesitate to try to **fine-tune different models**. The **best way to learn is by trying.**

### Keep Learning, Stay Awesome 🤗
"""