Spaces:

JayosChaos
/

Fine_tuning_my_first_model_or_agent

Running

App Files Files Community

Fine_tuning_my_first_model_or_agent / bonus_unit1.py

JayosChaos

Upload bonus_unit1.py

00e4fdb verified 6 days ago

raw

history blame contribute delete

19 kB

	# -- coding: utf-8 --
	"""bonus-unit1.ipynb

	Automatically generated by Colab.

	Original file is located at
	https://colab.research.google.com/#fileId=https%3A//huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb

	# Bonus Unit 1: Fine-Tuning a model for Function-Calling

	In this tutorial, we're going to Fine-Tune an LLM for Function Calling.

	This notebook is part of the <a href="https://www.hf.co/learn/agents-course/unit1/introduction">Hugging Face Agents Course</a>, a free Course from beginner to expert, where you learn to build Agents.

	<img src="https://huggingface.co./datasets/agents-course/course-images/resolve/main/en/communication/share.png" alt="Agent Course"/>

	## Prerequisites 🏗️

	Before diving into the notebook, you need to:

	🔲 📚 Study [What is Function-Calling](https://www.hf.co/learn/agents-course/bonus-unit1/what-is-function-calling) Section

	🔲 📚 Study [Fine-Tune your Model and what are LoRAs](https://www.hf.co/learn/agents-course/bonus-unit1/fine-tuning) Section

	# Step 0: Ask to Access Gemma on Hugging Face

	<img src="https://huggingface.co./datasets/agents-course/course-images/resolve/main/en/bonus-unit1/gemma.png" alt="Gemma"/>


	To access Gemma on Hugging Face:

	1. Make sure you're signed in to your Hugging Face Account

	2. Go to https://huggingface.co./google/gemma-2-2b-it

	3. Click on Acknowledge license and fill the form.

	Alternatively you can use another model, and modify the code accordingly (it can be a good exercise for you to be sure you know how to fine-tune for Function-Calling).

	You can use for instance:

	- [HuggingFaceTB/SmolLM2-1.7B-Instruct](https://huggingface.co./HuggingFaceTB/SmolLM2-1.7B-Instruct)

	- [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co./meta-llama/Llama-3.2-3B-Instruct)

	## Step 1: Set the GPU 💪

	If you're on Colab:

	- To accelerate the fine-tuning training, we'll use a GPU. To do that, go to `Runtime > Change Runtime type`

	<img src="https://huggingface.co./datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg" alt="GPU Step 1"/>

	- `Hardware Accelerator > GPU`

	<img src="https://huggingface.co./datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg" alt="GPU Step 2"/>


	### Important

	For this Unit, with the free-tier of Colab it will take around 6h to train.

	You have three solutions if you want to make it faster:

	1. Train on your computer if you have GPUs. It might take time but you have less risks of timeout.

	2. Use a Google Colab Pro that allows you use to A100 GPU (15-20min training).

	3. Just follow the code to learn how to do it without training.

	## Step 2: Install dependencies 📚

	We need multiple librairies:

	- `bitsandbytes` for quantization
	- `peft`for LoRA adapters
	- `Transformers`for loading the model
	- `datasets`for loading and using the fine-tuning dataset
	- `trl`for the trainer class
	"""

	!pip install -q -U bitsandbytes
	!pip install -q -U peft
	!pip install -q -U trl
	!pip install -q -U tensorboardX
	!pip install -q wandb

	"""## Step 3: Create your Hugging Face Token to push your model to the Hub

	To be able to share your model with the community there are some more steps to follow:

	1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co./join

	2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.

	- Create a new token (https://huggingface.co./settings/tokens) with write role

	<img src="https://huggingface.co./datasets/agents-course/course-images/resolve/main/en/bonus-unit1/create_write_token.png" alt="Create HF Token" width="50%">

	3️⃣ Store your token as an environment variable under the name "HF_TOKEN"
	- Be very carefull not to share it with others !

	## Step 4: Import the librairies

	Don't forget to put your HF token.
	"""

	from enum import Enum
	from functools import partial
	import pandas as pd
	import torch
	import json

	from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
	from datasets import load_dataset
	from trl import SFTConfig, SFTTrainer
	from peft import LoraConfig, TaskType

	seed = 42
	set_seed(seed)

	import os

	# Put your HF Token here
	os.environ['HF_TOKEN']="hf_xxxxxxx" # the token should have write access

	"""## Step 5: Processing the dataset into inputs

	In order to train the model, we need to format the inputs into what we want the model to learn.

	For this tutorial, I enhanced a popular dataset for function calling "NousResearch/hermes-function-calling-v1" by adding some new thinking step computer from deepseek-ai/DeepSeek-R1-Distill-Qwen-32B.

	But in order for the model to learn, we need to format the conversation correctly. If you followed Unit 1, you know that going from a list of messages to a prompt is handled by the chat_template, or, the default chat_template of gemma-2-2B does not contain tool calls. So we will need to modify it !

	This is the role of our preprocess function. To go from a list of messages, to a prompt that the model can understand.

	"""

	model_name = "google/gemma-2-2b-it"
	dataset_name = "Jofthomas/hermes-function-calling-thinking-V1"
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] \| trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"


	def preprocess(sample):
	messages = sample["messages"]
	first_message = messages[0]

	# Instead of adding a system message, we merge the content into the first user message
	if first_message["role"] == "system":
	system_message_content = first_message["content"]
	# Merge system content with the first user message
	messages[1]["content"] = system_message_content + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n\n" + messages[1]["content"]
	# Remove the system message from the conversation
	messages.pop(0)

	return {"text": tokenizer.apply_chat_template(messages, tokenize=False)}



	dataset = load_dataset(dataset_name)
	dataset = dataset.rename_column("conversations", "messages")

	"""## Step 6: A Dedicated Dataset for This Unit

	For this Bonus Unit, we created a custom dataset based on [NousResearch/hermes-function-calling-v1](https://huggingface.co./datasets/NousResearch/hermes-function-calling-v1), which is considered a reference when it comes to function-calling datasets.

	While the original dataset is excellent, it does not include a “thinking” step.

	In Function-Calling, such a step is optional, but recent work—like the deepseek model or the paper ["Test-Time Compute"](https://huggingface.co./papers/2408.03314)—suggests that giving an LLM time to “think” before it answers (or in this case, before taking an action) can significantly improve model performance.

	I, decided to then compute a subset of this dataset and to give it to [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co./deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) in order to compute some thinking tokens `<think>` before any function call. Which resulted in the following dataset :
	![Input Dataset](https://huggingface.co./datasets/agents-course/course-images/resolve/main/en/bonus-unit1/dataset_function_call.png)

	"""

	dataset = dataset.map(preprocess, remove_columns="messages")
	dataset = dataset["train"].train_test_split(0.1)
	print(dataset)

	"""## Step 7: Checking the inputs

	Let's manually look at what an input looks like !

	In this example we have :

	1. A User message containing the necessary information with the list of available tools inbetween `<tools></tools>` then the user query, here: `"Can you get me the latest news headlines for the United States?"`

	2. An Assistant message here called "model" to fit the criterias from gemma models containing two new phases, a "thinking" phase contained in `<think></think>` and an "Act" phase contained in `<tool_call></<tool_call>`.

	3. If the model contains a `<tools_call>`, we will append the result of this action in a new "Tool" message containing a `<tool_response></tool_response>` with the answer from the tool.
	"""

	# Let's look at how we formatted the dataset
	print(dataset["train"][8]["text"])

	# Sanity check
	print(tokenizer.pad_token)
	print(tokenizer.eos_token)

	"""## Step 8: Let's Modify the Tokenizer

	Indeed, as we saw in Unit 1, the tokenizer splits text into sub-words by default. This is not what we want for our new special tokens!

	While we segmented our example using `<think>`, `<tool_call>`, and `<tool_response>`, the tokenizer does not yet treat them as whole tokens—it still tries to break them down into smaller pieces. To ensure the model correctly interprets our new format, we must add these tokens to our tokenizer.

	Additionally, since we changed the `chat_template` in our preprocess function to format conversations as messages within a prompt, we also need to modify the `chat_template` in the tokenizer to reflect these changes.
	"""

	class ChatmlSpecialTokens(str, Enum):
	tools = "<tools>"
	eotools = "</tools>"
	think = "<think>"
	eothink = "</think>"
	tool_call="<tool_call>"
	eotool_call="</tool_call>"
	tool_response="<tool_reponse>"
	eotool_response="</tool_reponse>"
	pad_token = "<pad>"
	eos_token = "<eos>"
	@classmethod
	def list(cls):
	return [c.value for c in cls]

	tokenizer = AutoTokenizer.from_pretrained(
	model_name,
	pad_token=ChatmlSpecialTokens.pad_token.value,
	additional_special_tokens=ChatmlSpecialTokens.list()
	)
	tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] \| trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

	model = AutoModelForCausalLM.from_pretrained(model_name,
	attn_implementation='eager',
	device_map="auto")
	model.resize_token_embeddings(len(tokenizer))
	model.to(torch.bfloat16)

	"""## Step 9: Let's configure the LoRA

	This is we are going to define the parameter of our adapter. Those a the most important parameters in LoRA as they define the size and importance of the adapters we are training.
	"""

	from peft import LoraConfig

	# TODO: Configure LoRA parameters
	# r: rank dimension for LoRA update matrices (smaller = more compression)
	rank_dimension = 16
	# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
	lora_alpha = 64
	# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
	lora_dropout = 0.05

	peft_config = LoraConfig(r=rank_dimension,
	lora_alpha=lora_alpha,
	lora_dropout=lora_dropout,
	target_modules=["gate_proj","q_proj","lm_head","o_proj","k_proj","embed_tokens","down_proj","up_proj","v_proj"], # wich layer in the transformers do we target ?
	task_type=TaskType.CAUSAL_LM)

	"""## Step 10: Let's define the Trainer and the Fine-Tuning hyperparameters

	In this step, we define the Trainer, the class that we use to fine-tune our model and the hyperparameters.
	"""

	username="Jofthomas"# REPLCAE with your Hugging Face username
	output_dir = "gemma-2-2B-it-thinking-function_calling-V0" # The directory where the trained model checkpoints, logs, and other artifacts will be saved. It will also be the default name of the model when pushed to the hub if not redefined later.
	per_device_train_batch_size = 1
	per_device_eval_batch_size = 1
	gradient_accumulation_steps = 4
	logging_steps = 5
	learning_rate = 1e-4 # The initial learning rate for the optimizer.

	max_grad_norm = 1.0
	num_train_epochs=1
	warmup_ratio = 0.1
	lr_scheduler_type = "cosine"
	max_seq_length = 1500

	training_arguments = SFTConfig(
	output_dir=output_dir,
	per_device_train_batch_size=per_device_train_batch_size,
	per_device_eval_batch_size=per_device_eval_batch_size,
	gradient_accumulation_steps=gradient_accumulation_steps,
	save_strategy="no",
	eval_strategy="epoch",
	logging_steps=logging_steps,
	learning_rate=learning_rate,
	max_grad_norm=max_grad_norm,
	weight_decay=0.1,
	warmup_ratio=warmup_ratio,
	lr_scheduler_type=lr_scheduler_type,
	report_to="tensorboard",
	bf16=True,
	hub_private_repo=False,
	push_to_hub=False,
	num_train_epochs=num_train_epochs,
	gradient_checkpointing=True,
	gradient_checkpointing_kwargs={"use_reentrant": False},
	packing=True,
	max_seq_length=max_seq_length,
	)

	"""As Trainer, we use the `SFTTrainer` which is a Supervised Fine-Tuning Trainer."""

	trainer = SFTTrainer(
	model=model,
	args=training_arguments,
	train_dataset=dataset["train"],
	eval_dataset=dataset["test"],
	processing_class=tokenizer,
	peft_config=peft_config,
	)

	"""Here, we launch the training 🔥. Perfect time for you to pause and grab a coffee ☕."""

	trainer.train()
	trainer.save_model()

	"""## Step 11: Let's push the Model and the Tokenizer to the Hub

	Let's push our model and out tokenizer to the Hub ! The model will be pushed under your username + the output_dir that we specified earlier.
	"""

	trainer.push_to_hub(f"{username}/{output_dir}")

	"""Since we also modified the chat_template Which is contained in the tokenizer, let's also push the tokenizer with the model."""

	tokenizer.eos_token = "<eos>"
	# push the tokenizer to hub ( replace with your username and your previously specified
	tokenizer.push_to_hub(f"{username}/{output_dir}", token=True)

	"""## Step 12: Let's now test our model !

	To so, we will :

	1. Load the adapter from the hub !
	2. Load the base model : "google/gemma-2-2b-it" from the hub
	3. Resize the model to with the new tokens we introduced !
	"""

	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from datasets import load_dataset
	import torch

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_use_double_quant=True,
	)

	peft_model_id = f"{username}/{output_dir}" # replace with your newly trained adapter
	device = "auto"
	config = PeftConfig.from_pretrained(peft_model_id)
	model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
	device_map="auto",
	)
	tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
	model.resize_token_embeddings(len(tokenizer))
	model = PeftModel.from_pretrained(model, peft_model_id)
	model.to(torch.bfloat16)
	model.eval()

	print(dataset["test"][8]["text"])

	"""### Testing the model 🚀

	In that case, we will take the start of one of the samples from the test set and hope that it will generate the expected output.

	Since we want to test the function-calling capacities of our newly fine-tuned model, the input will be a user message with the available tools, a


	### Disclaimer ⚠️

	The dataset we’re using does not contain sufficient training data and is purely for educational purposes. As a result, your trained model’s outputs may differ from the examples shown in this course. Don’t be discouraged if your results vary—our primary goal here is to illustrate the core concepts rather than produce a fully optimized or production-ready model.

	"""

	#this prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.
	prompt="""<bos><start_of_turn>human
	You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
	<tool_call>
	{tool_call}
	</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

	Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>
	<start_of_turn>model
	<think>"""

	inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
	inputs = {k: v.to("cuda") for k,v in inputs.items()}
	outputs = model.generate(**inputs,
	max_new_tokens=300,# Adapt as necessary
	do_sample=True,
	top_p=0.95,
	temperature=0.01,
	repetition_penalty=1.0,
	eos_token_id=tokenizer.eos_token_id)
	print(tokenizer.decode(outputs[0]))

	"""## Congratulations
	Congratulations on finishing this first Bonus Unit 🥳

	You've just mastered what Function-Calling is and how to fine-tune your model to do Function-Calling!

	If it's the first time you do this, it's normal that you're feeling puzzled. Take time to check the documentation and understand each part of the code and why we did it this way.

	Also, don't hesitate to try to fine-tune different models. The best way to learn is by trying.

	### Keep Learning, Stay Awesome 🤗
	"""