diff --git "a/bonus-unit1/bonus-unit1.ipynb" "b/bonus-unit1/bonus-unit1.ipynb" new file mode 100644--- /dev/null +++ "b/bonus-unit1/bonus-unit1.ipynb" @@ -0,0 +1,1592 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "43b502c1-9548-4580-84ad-1cbac158edb8", + "metadata": { + "id": "43b502c1-9548-4580-84ad-1cbac158edb8" + }, + "source": [ + "# Bonus Unit 1: Fine-Tuning a model for Function-Calling\n", + "\n", + "In this tutorial, **we're going to Fine-Tune an LLM for Function Calling.**\n", + "\n", + "This notebook is part of the Hugging Face Agents Course, a free Course from beginner to expert, where you learn to build Agents.\n", + "\n", + "\"Agent\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Prerequisites πŸ—οΈ\n", + "\n", + "Before diving into the notebook, you need to:\n", + "\n", + "πŸ”² πŸ“š **Study [What is Function-Calling](https://www.hf.co/learn/agents-course/bonus-unit1/what-is-function-calling) Section**\n", + "\n", + "πŸ”² πŸ“š **Study [Fine-Tune your Model and what are LoRAs](https://www.hf.co/learn/agents-course/bonus-unit1/fine-tuning) Section**" + ], + "metadata": { + "id": "gWR4Rvpmjq5T" + }, + "id": "gWR4Rvpmjq5T" + }, + { + "cell_type": "markdown", + "source": [ + "# Step 0: Ask to Access Gemma on Hugging Face\n", + "\n", + "\"Gemma\"/\n", + "\n", + "\n", + "To access Gemma on Hugging Face:\n", + "\n", + "1. **Make sure you're signed in** to your Hugging Face Account\n", + "\n", + "2. Go to https://huggingface.co./google/gemma-2-2b-it\n", + "\n", + "3. Click on **Acknowledge license** and fill the form.\n", + "\n", + "Alternatively you can use another model, and modify the code accordingly (it can be a good exercise for you to be sure you know how to fine-tune for Function-Calling).\n", + "\n", + "You can use for instance:\n", + "\n", + "- [HuggingFaceTB/SmolLM2-1.7B-Instruct](https://huggingface.co./HuggingFaceTB/SmolLM2-1.7B-Instruct)\n", + "\n", + "- [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co./meta-llama/Llama-3.2-3B-Instruct)" + ], + "metadata": { + "id": "1rZXU_1wkEPu" + }, + "id": "1rZXU_1wkEPu" + }, + { + "cell_type": "markdown", + "source": [ + "## Step 1: Set the GPU πŸ’ͺ\n", + "\n", + "If you're on Colab:\n", + "\n", + "- To **accelerate the fine-tuning training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n", + "\n", + "\"GPU\n", + "\n", + "- `Hardware Accelerator > GPU`\n", + "\n", + "\"GPU\n", + "\n", + "\n", + "### Important\n", + "\n", + "For this Unit, **with the free-tier of Colab** it will take around **6h to train**.\n", + "\n", + "You have three solutions if you want to make it faster:\n", + "\n", + "1. Train on your computer if you have GPUs. It might take time but you have less risks of timeout.\n", + "\n", + "2. Use a Google Colab Pro that allows you use to A100 GPU (15-20min training).\n", + "\n", + "3. Just follow the code to learn how to do it without training." + ], + "metadata": { + "id": "5hjyx9nJlvKG" + }, + "id": "5hjyx9nJlvKG" + }, + { + "cell_type": "markdown", + "source": [ + "## Step 2: Install dependencies πŸ“š\n", + "\n", + "We need multiple librairies:\n", + "\n", + "- `bitsandbytes` for quantization\n", + "- `peft`for LoRA adapters\n", + "- `Transformers`for loading the model\n", + "- `datasets`for loading and using the fine-tuning dataset\n", + "- `trl`for the trainer class" + ], + "metadata": { + "id": "5Thjsc9fj6Ej" + }, + "id": "5Thjsc9fj6Ej" + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "e63f4962-c644-491e-aa91-50e453e953a4", + "metadata": { + "tags": [], + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e63f4962-c644-491e-aa91-50e453e953a4", + "outputId": "3c1563d4-74fe-46f0-adcd-3e935261a89d" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[2K \u001b[90m━━━━━━━━━━━���━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m69.7/69.7 MB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m363.4/363.4 MB\u001b[0m \u001b[31m2.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.8/13.8 MB\u001b[0m \u001b[31m54.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.6/24.6 MB\u001b[0m \u001b[31m23.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m883.7/883.7 kB\u001b[0m \u001b[31m41.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m664.8/664.8 MB\u001b[0m \u001b[31m2.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.5/211.5 MB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.3/56.3 MB\u001b[0m \u001b[31m10.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m127.9/127.9 MB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.5/207.5 MB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.1/21.1 MB\u001b[0m \u001b[31m51.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.0/44.0 kB\u001b[0m \u001b[31m3.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.0/10.0 MB\u001b[0m \u001b[31m93.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m342.1/342.1 kB\u001b[0m \u001b[31m19.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m484.9/484.9 kB\u001b[0m \u001b[31m27.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m143.5/143.5 kB\u001b[0m \u001b[31m12.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.8/194.8 kB\u001b[0m \u001b[31m13.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.1/44.1 kB\u001b[0m \u001b[31m2.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m365.7/365.7 kB\u001b[0m \u001b[31m21.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.0/10.0 MB\u001b[0m \u001b[31m71.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m54.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m101.7/101.7 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h" + ] + } + ], + "source": [ + "!pip install -q -U bitsandbytes\n", + "!pip install -q -U transformers\n", + "!pip install -q -U peft\n", + "!pip install -q -U accelerate\n", + "!pip install -q -U datasets\n", + "!pip install -q trl==0.12.2\n", + "!pip install -q -U tensorboardX\n", + "!pip install -q wandb" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Step 3: Create your Hugging Face Token to push your model to the Hub\n", + "\n", + "To be able to share your model with the community there are some more steps to follow:\n", + "\n", + "1️⃣ (If it's not already done) create an account to HF ➑ https://huggingface.co./join\n", + "\n", + "2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n", + "\n", + "- Create a new token (https://huggingface.co./settings/tokens) **with write role**\n", + "\n", + "\"Create\n", + "\n", + "3️⃣ Store your token as an environment variable under the name \"HF_TOKEN\"\n", + "- **Be very carefull not to share it with others** !" + ], + "metadata": { + "id": "UWNoZzi1urSZ" + }, + "id": "UWNoZzi1urSZ" + }, + { + "cell_type": "markdown", + "source": [ + "## Step 4: Import the librairies\n", + "\n", + "Don't forget to put your HF token." + ], + "metadata": { + "id": "vBAkwg9zu6A1" + }, + "id": "vBAkwg9zu6A1" + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "7ad2e4c2-593e-463e-9692-8d674c541d76", + "metadata": { + "tags": [], + "colab": { + "base_uri": "https://localhost:8080/", + "height": 382 + }, + "id": "7ad2e4c2-593e-463e-9692-8d674c541d76", + "outputId": "5004413d-0202-4031-85e3-d1897fb8eba5" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Exception ignored in: \n", + "Traceback (most recent call last):\n", + " File \"/usr/local/lib/python3.11/dist-packages/jax/_src/lib/__init__.py\", line 96, in _xla_gc_callback\n", + " def _xla_gc_callback(*args):\n", + " \n", + "KeyboardInterrupt: \n" + ] + }, + { + "output_type": "error", + "ename": "KeyboardInterrupt", + "evalue": "", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtransformers\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mAutoModelForCausalLM\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mAutoTokenizer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mTrainingArguments\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mBitsAndBytesConfig\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mset_seed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mdatasets\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mload_dataset\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtrl\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mSFTTrainer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mpeft\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mget_peft_model\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mLoraConfig\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mTaskType\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap.py\u001b[0m in \u001b[0;36m_handle_fromlist\u001b[0;34m(module, fromlist, import_, recursive)\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/trl/import_utils.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 89\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 90\u001b[0m \u001b[0mmodule\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 91\u001b[0;31m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodule\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 92\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mAttributeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"module {self.__name__} has no attribute {name}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/trl/import_utils.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 88\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 89\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 90\u001b[0;31m \u001b[0mmodule\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 91\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodule\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 92\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/trl/import_utils.py\u001b[0m in \u001b[0;36m_get_module\u001b[0;34m(self, module_name)\u001b[0m\n\u001b[1;32m 98\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 99\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 100\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mimportlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mimport_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\".\"\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 101\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 102\u001b[0m raise RuntimeError(\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/__init__.py\u001b[0m in \u001b[0;36mimport_module\u001b[0;34m(name, package)\u001b[0m\n\u001b[1;32m 124\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 125\u001b[0m \u001b[0mlevel\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 126\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_bootstrap\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_gcd_import\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mlevel\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpackage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 127\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mdatasets\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuilder\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mDatasetGenerationError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 27\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mhuggingface_hub\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mutils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_deprecation\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0m_deprecate_arguments\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 28\u001b[0;31m from transformers import (\n\u001b[0m\u001b[1;32m 29\u001b[0m \u001b[0mAutoModelForCausalLM\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 30\u001b[0m \u001b[0mAutoTokenizer\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap.py\u001b[0m in \u001b[0;36m_handle_fromlist\u001b[0;34m(module, fromlist, import_, recursive)\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 1764\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mPlaceholder\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1765\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1766\u001b[0;31m \u001b[0mmodule\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1767\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodule\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1768\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_modules\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py\u001b[0m in \u001b[0;36m_get_module\u001b[0;34m(self, module_name)\u001b[0m\n\u001b[1;32m 1776\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1777\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1778\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mimportlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mimport_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\".\"\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1779\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1780\u001b[0m raise RuntimeError(\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/__init__.py\u001b[0m in \u001b[0;36mimport_module\u001b[0;34m(name, package)\u001b[0m\n\u001b[1;32m 124\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 125\u001b[0m \u001b[0mlevel\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 126\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_bootstrap\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_gcd_import\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mlevel\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpackage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 127\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/trainer.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 40\u001b[0m \u001b[0;31m# Integrations must be imported before ML frameworks:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 41\u001b[0m \u001b[0;31m# isort: off\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 42\u001b[0;31m from .integrations import (\n\u001b[0m\u001b[1;32m 43\u001b[0m \u001b[0mget_reporting_integration_callbacks\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 44\u001b[0m \u001b[0mhp_params\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap.py\u001b[0m in \u001b[0;36m_handle_fromlist\u001b[0;34m(module, fromlist, import_, recursive)\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 1764\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mPlaceholder\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1765\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1766\u001b[0;31m \u001b[0mmodule\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1767\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodule\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1768\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_modules\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py\u001b[0m in \u001b[0;36m_get_module\u001b[0;34m(self, module_name)\u001b[0m\n\u001b[1;32m 1776\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1777\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1778\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mimportlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mimport_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\".\"\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1779\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1780\u001b[0m raise RuntimeError(\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/__init__.py\u001b[0m in \u001b[0;36mimport_module\u001b[0;34m(name, package)\u001b[0m\n\u001b[1;32m 124\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 125\u001b[0m \u001b[0mlevel\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 126\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_bootstrap\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_gcd_import\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mlevel\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpackage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 127\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/integrations/integration_utils.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 34\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpackaging\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mversion\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 35\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 36\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0;34m.\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mPreTrainedModel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mTFPreTrainedModel\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 37\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0;34m.\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0m__version__\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mversion\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 38\u001b[0m from ..utils import (\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap.py\u001b[0m in \u001b[0;36m_handle_fromlist\u001b[0;34m(module, fromlist, import_, recursive)\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 1764\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mPlaceholder\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1765\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1766\u001b[0;31m \u001b[0mmodule\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_class_to_module\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1767\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodule\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1768\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_modules\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py\u001b[0m in \u001b[0;36m_get_module\u001b[0;34m(self, module_name)\u001b[0m\n\u001b[1;32m 1776\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1777\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1778\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mimportlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mimport_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\".\"\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1779\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1780\u001b[0m raise RuntimeError(\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/__init__.py\u001b[0m in \u001b[0;36mimport_module\u001b[0;34m(name, package)\u001b[0m\n\u001b[1;32m 124\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 125\u001b[0m \u001b[0mlevel\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 126\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_bootstrap\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_gcd_import\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mlevel\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpackage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 127\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/modeling_tf_utils.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 36\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 37\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mDataCollatorWithPadding\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mDefaultDataCollator\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 38\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mactivations_tf\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mget_tf_activation\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 39\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mconfiguration_utils\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mPretrainedConfig\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 40\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mdynamic_module_utils\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mcustom_object_save\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/transformers/activations_tf.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 20\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 22\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mtf_keras\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mkeras\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 23\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mModuleNotFoundError\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mImportError\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 24\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mkeras\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/__init__.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;34m\"\"\"AUTOGENERATED. DO NOT EDIT.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0m__internal__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mactivations\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mapplications\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/__internal__/__init__.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;34m\"\"\"AUTOGENERATED. DO NOT EDIT.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__internal__\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mbackend\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__internal__\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mlayers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__internal__\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mlosses\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/__internal__/backend/__init__.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;34m\"\"\"AUTOGENERATED. DO NOT EDIT.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackend\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0m_initialize_variables\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0minitialize_variables\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackend\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtrack_variable\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/src/__init__.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 19\u001b[0m \"\"\"\n\u001b[1;32m 20\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 21\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mapplications\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 22\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mdistribute\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 23\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mlayers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/src/applications/__init__.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 18\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapplications\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconvnext\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mConvNeXtBase\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 19\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapplications\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconvnext\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mConvNeXtLarge\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 20\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapplications\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconvnext\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mConvNeXtSmall\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/src/applications/convnext.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mbackend\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0minitializers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 30\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mlayers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 31\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mutils\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 32\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapplications\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mimagenet_utils\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/src/layers/__init__.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 118\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnormalization\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayer_normalization\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mLayerNormalization\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 119\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnormalization\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munit_normalization\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mUnitNormalization\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 120\u001b[0;31m from tf_keras.src.layers.normalization.spectral_normalization import (\n\u001b[0m\u001b[1;32m 121\u001b[0m \u001b[0mSpectralNormalization\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 122\u001b[0m ) # noqa: E501\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/src/layers/normalization/spectral_normalization.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minitializers\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mTruncatedNormal\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 19\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrnn\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mWrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 20\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[0;31m# isort: off\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/src/layers/rnn/__init__.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtensorflow\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcompat\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mv2\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 19\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrnn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mabstract_rnn_cell\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mAbstractRNNCell\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 20\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[0;31m# Recurrent layers.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.11/dist-packages/tf_keras/src/layers/rnn/abstract_rnn_cell.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mbase_layer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 19\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mtf_keras\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrnn\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mrnn_utils\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 20\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[0;31m# isort: off\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap.py\u001b[0m in \u001b[0;36m_find_and_load\u001b[0;34m(name, import_)\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap.py\u001b[0m in \u001b[0;36m_find_and_load_unlocked\u001b[0;34m(name, import_)\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap.py\u001b[0m in \u001b[0;36m_load_unlocked\u001b[0;34m(spec)\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap_external.py\u001b[0m in \u001b[0;36mexec_module\u001b[0;34m(self, module)\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap_external.py\u001b[0m in \u001b[0;36mget_code\u001b[0;34m(self, fullname)\u001b[0m\n", + "\u001b[0;32m/usr/lib/python3.11/importlib/_bootstrap_external.py\u001b[0m in \u001b[0;36m_code_to_timestamp_pyc\u001b[0;34m(code, mtime, source_size)\u001b[0m\n", + "\u001b[0;31mKeyboardInterrupt\u001b[0m: " + ] + } + ], + "source": [ + "from enum import Enum\n", + "from functools import partial\n", + "import pandas as pd\n", + "import torch\n", + "import json\n", + "\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig, set_seed\n", + "from datasets import load_dataset\n", + "from trl import SFTTrainer\n", + "from peft import get_peft_model, LoraConfig, TaskType\n", + "\n", + "seed = 42\n", + "set_seed(seed)\n", + "\n", + "import os\n", + "\n", + "# Put your HF Token here\n", + "os.environ['HF_TOKEN']=\"hf_xxx\"" + ] + }, + { + "cell_type": "markdown", + "id": "44f30b2c-2cc0-48e0-91ca-4633e6444105", + "metadata": { + "id": "44f30b2c-2cc0-48e0-91ca-4633e6444105" + }, + "source": [ + "## Step 5: Processing the dataset into inputs\n", + "\n", + "In order to train the model, we need to **format the inputs into what we want the model to learn**.\n", + "\n", + "For this tutorial, I enhanced a popular dataset for function calling \"NousResearch/hermes-function-calling-v1\" by adding some new **thinking** step computer from **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B**.\n", + "\n", + "But in order for the model to learn, we need **to format the conversation correctly**. If you followed Unit 1, you know that going from a list of messages to a prompt is handled by the **chat_template**, or, the default chat_template of gemma-2-2B does not contain tool calls. So we will need to modify it !\n", + "\n", + "This is the role of our **preprocess** function. To go from a list of messages, to a prompt that the model can understand.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29da85c8-33bf-4864-aed7-733cbe703512", + "metadata": { + "tags": [], + "id": "29da85c8-33bf-4864-aed7-733cbe703512" + }, + "outputs": [], + "source": [ + "model_name = \"google/gemma-2-2b-it\"\n", + "dataset_name = \"Jofthomas/hermes-function-calling-thinking-V1\"\n", + "tokenizer = AutoTokenizer.from_pretrained(model_name)\n", + "\n", + "tokenizer.chat_template = \"{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '' + message['role'] + '\\n' + message['content'] | trim + '\\n' }}{% endfor %}{% if add_generation_prompt %}{{'model\\n'}}{% endif %}\"\n", + "\n", + "\n", + "def preprocess(samples):\n", + " batch = []\n", + " for conversations in zip(samples[\"conversations\"]):\n", + " conversation = conversations[0]\n", + "\n", + " # Instead of adding a system message, we merge the content into the first user message\n", + " if conversation[0][\"role\"] == \"system\":\n", + " system_message_content = conversation[0][\"content\"]\n", + " # Merge system content with the first user message\n", + " conversation[1][\"content\"] = system_message_content + \"Also, before making a call to a function take the time to plan the function to take. Make that thinking process between {your thoughts}\\n\\n\" + conversation[1][\"content\"]\n", + " # Remove the system message from the conversation\n", + " conversation.pop(0)\n", + "\n", + " batch.append(tokenizer.apply_chat_template(conversation, tokenize=False))\n", + "\n", + " return {\"content\": batch}\n", + "\n", + "dataset = load_dataset(dataset_name)\n" + ] + }, + { + "cell_type": "markdown", + "id": "dc8736d5-d64b-4c5c-9738-be08421d3f95", + "metadata": { + "id": "dc8736d5-d64b-4c5c-9738-be08421d3f95" + }, + "source": [ + "## Step 6: A Dedicated Dataset for This Unit\n", + "\n", + "For this Bonus Unit, we created a custom dataset based on [NousResearch/hermes-function-calling-v1](https://huggingface.co./datasets/NousResearch/hermes-function-calling-v1), which is considered a **reference** when it comes to function-calling datasets.\n", + "\n", + "While the original dataset is excellent, it does **not** include a **β€œthinking”** step.\n", + "\n", + "In Function-Calling, such a step is optional, but recent workβ€”like the **deepseek** model or the paper [\"Test-Time Compute\"](https://huggingface.co./papers/2408.03314)β€”suggests that giving an LLM time to β€œthink” before it answers (or in this case, **before** taking an action) can **significantly improve** model performance.\n", + "\n", + "I, decided to then compute a subset of this dataset and to give it to [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co./deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) in order to compute some thinking tokens `` before any function call. Which resulted in the following dataset :\n", + "![Input Dataset](https://huggingface.co./datasets/agents-course/course-images/resolve/main/en/bonus-unit1/dataset_function_call.png)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b63d4832-d92e-482d-9fe6-6e9dbfee377a", + "metadata": { + "tags": [], + "id": "b63d4832-d92e-482d-9fe6-6e9dbfee377a", + "outputId": "bda88f48-ca5d-4f47-b887-48d4ea5a53aa" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "DatasetDict({\n", + " train: Dataset({\n", + " features: ['content'],\n", + " num_rows: 3213\n", + " })\n", + " test: Dataset({\n", + " features: ['content'],\n", + " num_rows: 357\n", + " })\n", + "})\n" + ] + } + ], + "source": [ + "dataset = dataset.map(\n", + " preprocess,\n", + " batched=True,\n", + " remove_columns=dataset[\"train\"].column_names\n", + ")\n", + "dataset = dataset[\"train\"].train_test_split(0.1)\n", + "print(dataset)" + ] + }, + { + "cell_type": "markdown", + "id": "67724a23-f298-4247-b002-2cf370b03897", + "metadata": { + "id": "67724a23-f298-4247-b002-2cf370b03897" + }, + "source": [ + "## Step 7: Checking the inputs\n", + "\n", + "Let's manually look at what an input looks like !\n", + "\n", + "In this example we have :\n", + "\n", + "1. A *User message* containing the **necessary information with the list of available tools** inbetween `` then the user query, here: `\"Can you get me the latest news headlines for the United States?\"`\n", + "\n", + "2. An *Assistant message* here called \"model\" to fit the criterias from gemma models containing two new phases, a **\"thinking\"** phase contained in `` and an **\"Act\"** phase contained in ``.\n", + "\n", + "3. If the model contains a ``, we will append the result of this action in a new **\"Tool\"** message containing a `` with the answer from the tool." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc60da04-9411-487a-b629-2c59024a20c0", + "metadata": { + "tags": [], + "id": "dc60da04-9411-487a-b629-2c59024a20c0", + "outputId": "6709e478-17b8-4769-865f-2cd025727ad4" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "human\n", + "You are a function calling AI model. You are provided with function signatures within XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools: [{'type': 'function', 'function': {'name': 'get_news_headlines', 'description': 'Get the latest news headlines', 'parameters': {'type': 'object', 'properties': {'country': {'type': 'string', 'description': 'The country for which headlines are needed'}}, 'required': ['country']}}}, {'type': 'function', 'function': {'name': 'search_recipes', 'description': 'Search for recipes based on ingredients', 'parameters': {'type': 'object', 'properties': {'ingredients': {'type': 'array', 'items': {'type': 'string'}, 'description': 'The list of ingredients'}}, 'required': ['ingredients']}}}] Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within XML tags as follows:\n", + "\n", + "{tool_call}\n", + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between {your thoughts}\n", + "\n", + "Can you get me the latest news headlines for the United States?\n", + "model\n", + "Okay, so the user asked for the latest news headlines in the United States. I need to figure out how to respond appropriately using the tools available.\n", + "\n", + "Looking at the tools provided, there's a function called get_news_headlines. That seems perfect for this situation. It requires a country code, which the user specified as the United States.\n", + "\n", + "Since the user is asking specifically about the US, I'll use the country parameter as 'United States' in the function call. This should fetch the relevant news for them.\n", + "\n", + "I should make sure to format the response correctly as a JSON object with the function name and arguments.\n", + "\n", + "{'name': 'get_news_headlines', 'arguments': {'country': 'United States'}}\n", + "\n", + "tool\n", + "\n", + "{'headlines': ['US President signs new bill into law', 'Major earthquake hits California', 'NASA announces new Mars mission', 'Tech giant releases new smartphone', 'US economy shows signs of recovery']}\n", + "\n", + "model\n", + "Here are the latest news headlines for the United States:\n", + "1. US President signs new bill into law\n", + "2. Major earthquake hits California\n", + "3. NASA announces new Mars mission\n", + "4. Tech giant releases new smartphone\n", + "5. US economy shows signs of recovery\n", + "human\n", + "That's interesting. Can you get me the news for France now?\n", + "model\n", + "Okay, the user initially asked for the latest US news, so I used get_news_headlines with 'United States'. Now the user is asking for France, so I'll call the same function with 'France' as the country. That makes sense because the function is designed for any country.\n", + "\n", + "{'name': 'get_news_headlines', 'arguments': {'country': 'France'}}\n", + "\n", + "tool\n", + "\n", + "{'headlines': ['French President announces new environmental policy', 'Paris Fashion Week highlights', 'France wins World Cup qualifier', 'New culinary trend sweeps across France', 'French tech startup raises millions in funding']}\n", + "\n", + "model\n", + "Here are the latest news headlines for France:\n", + "1. French President announces new environmental policy\n", + "2. Paris Fashion Week highlights\n", + "3. France wins World Cup qualifier\n", + "4. New culinary trend sweeps across France\n", + "5. French tech startup raises millions in funding\n", + "\n" + ] + } + ], + "source": [ + "# Let's look at how we formatted the dataset\n", + "print(dataset[\"train\"][8][\"content\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "53a48281-2346-4dfb-ad60-cad85129ec9b", + "metadata": { + "tags": [], + "id": "53a48281-2346-4dfb-ad60-cad85129ec9b", + "outputId": "da4f7e33-227c-48bc-b3c3-24df31313a69" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n" + ] + } + ], + "source": [ + "# Sanity check\n", + "print(tokenizer.pad_token)\n", + "print(tokenizer.eos_token)" + ] + }, + { + "cell_type": "markdown", + "id": "d6864b36-6033-445a-b6e2-b6bb02e38e26", + "metadata": { + "id": "d6864b36-6033-445a-b6e2-b6bb02e38e26" + }, + "source": [ + "## Step 8: Let's Modify the Tokenizer\n", + "\n", + "Indeed, as we saw in Unit 1, the tokenizer splits text into sub-words by default. This is **not** what we want for our new special tokens!\n", + "\n", + "While we segmented our example using ``, ``, and ``, the tokenizer does **not** yet treat them as whole tokensβ€”it still tries to break them down into smaller pieces. To ensure the model correctly interprets our new format, we must **add these tokens** to our tokenizer.\n", + "\n", + "Additionally, since we changed the `chat_template` in our **preprocess** function to format conversations as messages within a prompt, we also need to modify the `chat_template` in the tokenizer to reflect these changes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "833ba5d6-4c1e-4689-9fed-22cc03d2a63a", + "metadata": { + "tags": [], + "colab": { + "referenced_widgets": [ + "fc7eee07d5824955adb7b9bbd025c297" + ] + }, + "id": "833ba5d6-4c1e-4689-9fed-22cc03d2a63a", + "outputId": "2ef794e9-3e8e-4dec-d31c-b9242386c2d0" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fc7eee07d5824955adb7b9bbd025c297", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Loading checkpoint shards: 0%| | 0/2 [00:00\"\n", + " eotools = \"\"\n", + " think = \"\"\n", + " eothink = \"\"\n", + " tool_call=\"\"\n", + " eotool_call=\"\"\n", + " tool_response=\"\"\n", + " eotool_response=\"\"\n", + " pad_token = \"\"\n", + " eos_token = \"\"\n", + " @classmethod\n", + " def list(cls):\n", + " return [c.value for c in cls]\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(\n", + " model_name,\n", + " pad_token=ChatmlSpecialTokens.pad_token.value,\n", + " additional_special_tokens=ChatmlSpecialTokens.list()\n", + " )\n", + "tokenizer.chat_template = \"{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '' + message['role'] + '\\n' + message['content'] | trim + '\\n' }}{% endfor %}{% if add_generation_prompt %}{{'model\\n'}}{% endif %}\"\n", + "\n", + "model = AutoModelForCausalLM.from_pretrained(model_name,\n", + " attn_implementation='eager',\n", + " device_map=\"auto\")\n", + "model.resize_token_embeddings(len(tokenizer))\n", + "model.to(torch.bfloat16)\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Step 9: Let's configure the LoRA\n", + "\n", + "ADD COMMENTS JOFFREY" + ], + "metadata": { + "id": "X6DBY8AqxFLL" + }, + "id": "X6DBY8AqxFLL" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "482d36ab-e326-4fd7-bc59-425abcca55e7", + "metadata": { + "tags": [], + "id": "482d36ab-e326-4fd7-bc59-425abcca55e7" + }, + "outputs": [], + "source": [ + "from peft import LoraConfig\n", + "\n", + "# TODO: Configure LoRA parameters\n", + "# r: rank dimension for LoRA update matrices (smaller = more compression)\n", + "rank_dimension = 16\n", + "# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)\n", + "lora_alpha = 64\n", + "# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)\n", + "lora_dropout = 0.05\n", + "\n", + "peft_config = LoraConfig(r=rank_dimension,\n", + " lora_alpha=lora_alpha,\n", + " lora_dropout=lora_dropout,\n", + " target_modules=[\"gate_proj\",\"q_proj\",\"lm_head\",\"o_proj\",\"k_proj\",\"embed_tokens\",\"down_proj\",\"up_proj\",\"v_proj\"], # wich layer in the transformers do we target ?\n", + " task_type=TaskType.CAUSAL_LM)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Step 10: Let's define the Trainer and the Fine-Tuning hyperparameters\n", + "\n", + "In this step, we define the Trainer, the class that we use to fine-tune our model and the hyperparameters." + ], + "metadata": { + "id": "zdDR9hzgxPu2" + }, + "id": "zdDR9hzgxPu2" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3598b688-5a6f-437f-95ac-4794688cd38f", + "metadata": { + "tags": [], + "id": "3598b688-5a6f-437f-95ac-4794688cd38f", + "outputId": "515f019f-87b6-40cb-9344-f4c19645e077" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/user/miniconda/lib/python3.9/site-packages/transformers/training_args.py:1568: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of πŸ€— Transformers. Use `eval_strategy` instead\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "output_dir = \"gemma-2-2B-it-thinking-function_calling\"\n", + "per_device_train_batch_size = 1\n", + "per_device_eval_batch_size = 1\n", + "gradient_accumulation_steps = 4\n", + "logging_steps = 5\n", + "learning_rate = 1e-4\n", + "max_grad_norm = 1.0\n", + "num_train_epochs=1\n", + "warmup_ratio = 0.1\n", + "lr_scheduler_type = \"cosine\"\n", + "max_seq_length = 2048\n", + "\n", + "training_arguments = TrainingArguments(\n", + " output_dir=output_dir,\n", + " per_device_train_batch_size=per_device_train_batch_size,\n", + " per_device_eval_batch_size=per_device_eval_batch_size,\n", + " gradient_accumulation_steps=gradient_accumulation_steps,\n", + " save_strategy=\"no\",\n", + " evaluation_strategy=\"epoch\",\n", + " logging_steps=logging_steps,\n", + " learning_rate=learning_rate,\n", + " max_grad_norm=max_grad_norm,\n", + " weight_decay=0.1,\n", + " warmup_ratio=warmup_ratio,\n", + " lr_scheduler_type=lr_scheduler_type,\n", + " report_to=\"tensorboard\",\n", + " bf16=True,\n", + " hub_private_repo=False,\n", + " push_to_hub=False,\n", + " num_train_epochs=num_train_epochs,\n", + " gradient_checkpointing=True,\n", + " gradient_checkpointing_kwargs={\"use_reentrant\": False}\n", + ")" + ] + }, + { + "cell_type": "markdown", + "source": [ + "As Trainer, we use the `SFTTrainer` which is a Supervised Fine-Tuning Trainer." + ], + "metadata": { + "id": "59TTqmW2xmV2" + }, + "id": "59TTqmW2xmV2" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba0366b5-c9d0-4f7e-97e0-1f964cfad147", + "metadata": { + "tags": [], + "id": "ba0366b5-c9d0-4f7e-97e0-1f964cfad147", + "outputId": "8b2836b3-3a06-4c05-b046-6c7923911e40" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/user/miniconda/lib/python3.9/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': packing, dataset_text_field, max_seq_length, dataset_kwargs. Will not be supported from version '0.13.0'.\n", + "\n", + "Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.\n", + " warnings.warn(message, FutureWarning)\n", + "/home/user/miniconda/lib/python3.9/site-packages/transformers/training_args.py:1568: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of πŸ€— Transformers. Use `eval_strategy` instead\n", + " warnings.warn(\n", + "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:212: UserWarning: You passed a `packing` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", + " warnings.warn(\n", + "/home/user/miniconda/lib/python3.9/site-packages/peft/tuners/tuners_utils.py:543: UserWarning: Model with `tie_word_embeddings=True` and the tied_target_modules=['lm_head'] are part of the adapter. This can lead to complications, for example when merging the adapter or converting your model to formats other than safetensors. See for example https://github.com/huggingface/peft/issues/2018.\n", + " warnings.warn(\n", + "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", + " warnings.warn(\n", + "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:328: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", + " warnings.warn(\n", + "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:334: UserWarning: You passed a `dataset_kwargs` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.\n", + " warnings.warn(\n", + "/home/user/miniconda/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:403: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "trainer = SFTTrainer(\n", + " model=model,\n", + " args=training_arguments,\n", + " train_dataset=dataset[\"train\"],\n", + " eval_dataset=dataset[\"test\"],\n", + " tokenizer=tokenizer,\n", + " packing=True,\n", + " dataset_text_field=\"content\",\n", + " max_seq_length=max_seq_length,\n", + " peft_config=peft_config,\n", + " dataset_kwargs={\n", + " \"append_concat_token\": False,\n", + " \"add_special_tokens\": False,\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Here, we launch the training πŸ”₯. Perfect time for you to pause and grab a coffee β˜•." + ], + "metadata": { + "id": "MtHjukK9xviB" + }, + "id": "MtHjukK9xviB" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9e2df2e9-a82b-4540-aa89-1b40b70a7781", + "metadata": { + "tags": [], + "id": "9e2df2e9-a82b-4540-aa89-1b40b70a7781", + "outputId": "8ad7e555-678b-4904-aa6b-000b619c9341" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [389/389 14:14, Epoch 0/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation Loss
00.2946000.289091

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/user/miniconda/lib/python3.9/site-packages/peft/utils/save_and_load.py:230: UserWarning: Setting `save_embedding_layers` to `True` as embedding layers found in `target_modules`.\n", + " warnings.warn(\"Setting `save_embedding_layers` to `True` as embedding layers found in `target_modules`.\")\n" + ] + } + ], + "source": [ + "trainer.train()\n", + "trainer.save_model()" + ] + }, + { + "cell_type": "markdown", + "id": "1d7ea3ab-7c8c-47ad-acd2-99fbe5b68393", + "metadata": { + "tags": [], + "id": "1d7ea3ab-7c8c-47ad-acd2-99fbe5b68393" + }, + "source": [ + "## Step 11: Let's push the Model and the Tokenizer to the Hub\n", + "\n", + "Let's push our model and out tokenizer to the Hub ! The model will be pushed under your username + the output_dir that we specified earlier." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "370af020-9319-4ff7-bea1-2842a4847caa", + "metadata": { + "tags": [], + "colab": { + "referenced_widgets": [ + "68b34e3d2eae4f24b83fa65cf5815738", + "15e7c632053c4ed88267061a8112d641", + "047ebf7fda8643c090129bc2b86a7e3e", + "ef5dab829e8b491581f6dae2b7718113", + "97ba6384a2f94db0880a17ff433a8ed9", + "ba0c0c53b23047ac9336fdbf8597f32f", + "8a033d5bd32b4a969fd5af612c550243", + "714d4a40a67a4763ba8f7ae029befbd7", + "52451f392b434fb39b882c9f094bd995", + "08c2ca8a719e4142bd877ae94d242f2e", + "02e660bd44a7414fae439751e9ffa1f2" + ] + }, + "id": "370af020-9319-4ff7-bea1-2842a4847caa", + "outputId": "f5797c01-306d-48ee-a009-3c115f5b1ca5" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "68b34e3d2eae4f24b83fa65cf5815738", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "adapter_model.safetensors: 0%| | 0.00/2.48G [00:00\"\n", + "# push the tokenizer to hub ( replace with your username and your previously specified\n", + "tokenizer.push_to_hub(f\"username/{output_dir}\", token=True)" + ] + }, + { + "cell_type": "markdown", + "id": "76d275ce-a3e6-4d30-8d8c-0ee274de5370", + "metadata": { + "id": "76d275ce-a3e6-4d30-8d8c-0ee274de5370" + }, + "source": [ + "## Step 12: Let's now test our model !\n", + "\n", + "To so, we will :\n", + "\n", + "1. Load the adapter from the hub !\n", + "2. Load the base model : **\"google/gemma-2-2b-it\"** from the hub\n", + "3." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56b89825-70ac-42c1-934c-26e2d54f3b7b", + "metadata": { + "tags": [], + "colab": { + "referenced_widgets": [ + "390c54434b6448b988ce015eeafe34c9", + "35b2fe2d357b46488ccef710f2a9bfd7", + "9c313149d4324bdaa9c8ddc373964d18" + ] + }, + "id": "56b89825-70ac-42c1-934c-26e2d54f3b7b", + "outputId": "a4cd00b8-61fa-4522-d563-c4ef7e18807d" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "390c54434b6448b988ce015eeafe34c9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "adapter_config.json: 0%| | 0.00/829 [00:00human\n", + "You are a function calling AI model. You are provided with function signatures within XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools: [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within XML tags as follows:\n", + "\n", + "{tool_call}\n", + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between {your thoughts}\n", + "\n", + "Hi, I need to convert 500 USD to Euros. Can you help me with that?\n", + "model\n", + "Alright, so I'm looking at this scenario where the user needs help converting 500 USD to Euros. Let's break it down step by step to understand how to approach this.\n", + "\n", + "First, the user has asked for conversion from USD to EUR. I know there's a specific tool available for currency conversion called `convert_currency`. This tool requires three parameters: amount, from_currency, and to_currency. In the user's message, they've provided all necessary detailsβ€”500 USD and the desire to convert to Euros. \n", + "\n", + "I don't need to make any assumptions here because the user has already provided the exact amount and the target currency. My next step is to prepare the function call with the correct parameters. So, I'll structure the request with the amount as 500, from_currency as USD, and to_currency as EUR. \n", + "\n", + "This will be sent in JSON format as specified, ensuring that the tool receives all the necessary information to perform the conversion accurately. Once the function is called, it should return the converted amount, which I can then present back to the user.\n", + "\n", + "{'name': 'convert_currency', 'arguments': {'amount': 500, 'from_currency': 'USD', 'to_currency': 'EUR'}}\n", + "\n", + "tool\n", + "\n", + "{'converted_amount': 420.5, 'from_currency': 'USD', 'to_currency': 'EUR'}\n", + "\n", + "model\n", + "Sure, 500 USD is approximately 420.50 Euros.\n", + "human\n", + "That's great! Now, can you convert 1000 Euros back to USD?\n", + "model\n", + "Okay, so I'm trying to figure out how to handle the next function call based on the conversation and the previous steps. Let me see...\n", + "\n", + "In the conversation, the user first asked to convert 500 USD to Euros, and the model used the `convert_currency` function with the parameters amount=500, from_currency=USD, to_currency=EUR. Then, after getting the response, the model provided the converted amount. Now, the user is asking to convert 1000 Euros back to USD. \n", + "\n", + "So, the user's request is clear: convert 1000 EUR to USD. Since the function `convert_currency` is designed to handle currency conversions, and it requires amount, from_currency, and to_currency as parameters, it makes sense to call this function again with the updated values. \n", + "\n", + "The amount is 1000, from_currency should be EUR because the user is converting from Euros, and to_currency should be USD because they want the result in US Dollars. I don't need to make any assumptions here because all the required information is provided by the user. \n", + "\n", + "I'll structure the function call with these parameters. This should give the user the accurate converted amount from Euros to USD. Once the tool processes this, the model can present the result back to the user.\n", + "\n", + "{'name': 'convert_currency', 'arguments': {'amount': 1000, 'from_currency': 'EUR', 'to_currency': 'USD'}}\n", + "\n", + "tool\n", + "\n", + "{'converted_amount': 1190.0, 'from_currency': 'EUR', 'to_currency': 'USD'}\n", + "\n", + "model\n", + "1000 Euros is approximately 1190.00 USD.\n", + "human\n", + "Thank you for your help!\n", + "model\n", + "You're welcome! If you have any other questions, feel free to ask.\n", + "\n" + ] + } + ], + "source": [ + "print(dataset[\"test\"][8][\"content\"])" + ] + }, + { + "cell_type": "markdown", + "id": "b47fd511-ea00-47ce-8618-6e78e25672b2", + "metadata": { + "id": "b47fd511-ea00-47ce-8618-6e78e25672b2" + }, + "source": [ + "### Testing the model πŸš€\n", + "\n", + "In that case, we will take the start of one of the samples from the test set and hope that it will generate the expected output.\n", + "\n", + "Since we want to test the function-calling capacities of our newly fine-tuned model, the input will be a user message with the available tools, a\n", + "\n", + "\n", + "### Disclaimer ⚠️\n", + "\n", + "The dataset we’re using **does not contain sufficient training data** and is purely for **educational purposes**. As a result, **your trained model’s outputs may differ** from the examples shown in this course. **Don’t be discouraged** if your results varyβ€”our primary goal here is to illustrate the core concepts rather than produce a fully optimized or production-ready model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37bf938d-08fa-4577-9966-0238339afcdb", + "metadata": { + "tags": [], + "id": "37bf938d-08fa-4577-9966-0238339afcdb", + "outputId": "e97e7a1e-5ab2-46a2-dc3a-f436964fe004" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "human\n", + "You are a function calling AI model. You are provided with function signatures within XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools: [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within XML tags as follows:\n", + "\n", + "{tool_call}\n", + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between {your thoughts}\n", + "\n", + "Hi, I need to convert 500 USD to Euros. Can you help me with that?\n", + "model\n", + "Okay, so the user is asking to convert 500 USD to Euros. I need to figure out how to respond. Looking at the available tools, there's a function called convert_currency which does exactly that. It takes an amount, the source currency, and the target currency. The user provided all the necessary details: 500, USD, and EUR. So, I should call convert_currency with these parameters. That should give the user the converted amount they need.\n", + "\n", + "{'name': 'convert_currency', 'arguments': {'amount': 500, 'from_currency': 'USD', 'to_currency': 'EUR'}}\n", + "\n" + ] + } + ], + "source": [ + "prompt=\"\"\"human\n", + "You are a function calling AI model. You are provided with function signatures within XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools: [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within XML tags as follows:\n", + "\n", + "{tool_call}\n", + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between {your thoughts}\n", + "\n", + "Hi, I need to convert 500 USD to Euros. Can you help me with that?\n", + "model\n", + "\"\"\"\n", + "\n", + "inputs = tokenizer(prompt, return_tensors=\"pt\", add_special_tokens=False)\n", + "inputs = {k: v.to(\"cuda\") for k,v in inputs.items()}\n", + "outputs = model.generate(**inputs,\n", + " max_new_tokens=300,\n", + " do_sample=True,\n", + " top_p=0.65,\n", + " temperature=0.01,\n", + " repetition_penalty=1.0,\n", + " eos_token_id=tokenizer.eos_token_id)\n", + "print(tokenizer.decode(outputs[0]))" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Congratulations on finishing this first Bonus Unit πŸ₯³\n", + "\n", + "You've just **mastered what Function-Calling is and how to fine-tune your model to do Function-Calling**!\n", + "\n", + "If it's the first time you do this, it's normal that you're feeling puzzled. Take time to check the documentation and understand each part of the code and why we did it this way.\n", + "\n", + "Also, don't hesitate to try to **fine-tune different models**. The **best way to learn is by trying.**\n", + "\n", + "### Keep Learning, Stay Awesome πŸ€—" + ], + "metadata": { + "id": "xWewPCZOyfJQ" + }, + "id": "xWewPCZOyfJQ" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.5" + }, + "colab": { + "provenance": [] + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file