AeNyoung commited on
Commit
8dcd01a
β€’
1 Parent(s): 074e71e

Upload Finetune_flan_t5_large_bnb_peft (1).ipynb

Browse files
Finetune_flan_t5_large_bnb_peft (1).ipynb ADDED
@@ -0,0 +1 @@
 
 
1
+ {"cells":[{"cell_type":"markdown","metadata":{"id":"lw1cWgq-DI5k"},"source":["# Fine-tune FLAN-T5 using `bitsandbytes`, `peft` \u0026 `transformers` πŸ€—"]},{"cell_type":"markdown","metadata":{"id":"kBFPA3-aDT7H"},"source":["In this notebook we will see how to properly use `peft` , `transformers` \u0026 `bitsandbytes` to fine-tune `flan-t5-large` in a google colab!\n","\n","We will finetune the model on [`financial_phrasebank`](https://huggingface.co/datasets/financial_phrasebank) dataset, that consists of pairs of text-labels to classify financial-related sentences, if they are either `positive`, `neutral` or `negative`.\n","\n","Note that you could use the same notebook to fine-tune `flan-t5-xl` as well, but you would need to shard the models first to avoid CPU RAM issues on Google Colab, check [these weights](https://huggingface.co/ybelkada/flan-t5-xl-sharded-bf16)."]},{"cell_type":"markdown","metadata":{"id":"5TXx1vj8kJSu"},"source":["## TODO #1\n","\n","`google/flan-t5-large` λͺ¨λΈμ€ 무엇을 λͺ©ν‘œλ‘œ λ§Œλ“€μ–΄μ‘Œκ³  κΈ°λŒ€ν•  수 μžˆλŠ” κΈ°λŠ₯은 무엇인지 μ‘°μ‚¬ν•˜μ‹œμ˜€\n","\n","- λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈμ˜ ν•œκ³„ 극볡(일반 GPUμ—μ„œ μ΄ˆκ±°λŒ€ LLM을 λ‘œλ”©, λ―Έμ„ΈνŠœλ‹ ν•˜λŠ” 것은 λΆˆκ°€λŠ₯)\n","- λ²ˆμ—­, μš”μ•½, CoLA, STSB μž‘μ—…μ„ μˆ˜ν–‰."]},{"cell_type":"markdown","metadata":{"id":"ShAuuHCDDkvk"},"source":["## Install requirements"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"background_save":true,"base_uri":"https://localhost:8080/"},"id":"DRQ4ZrJTDkSy"},"outputs":[{"name":"stdout","output_type":"stream","text":["\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m92.6/92.6 MB\u001b[0m \u001b[31m8.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m519.6/519.6 kB\u001b[0m \u001b[31m43.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m258.1/258.1 kB\u001b[0m \u001b[31m26.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m12.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m18.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m15.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m295.0/295.0 kB\u001b[0m \u001b[31m25.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25h Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n"," Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n"," Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n"," Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n"," Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n"," Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.8/3.8 MB\u001b[0m \u001b[31m33.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m42.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25h Building wheel for transformers (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n"," Building wheel for peft (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n"]}],"source":["!pip install -q bitsandbytes datasets accelerate\n","!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git@main"]},{"cell_type":"markdown","metadata":{"id":"QBdCIrizDxFw"},"source":["## Import model and tokenizer"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"dd3c5acc"},"outputs":[],"source":["# Select CUDA device index\n","import os\n","import torch\n","\n","os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n","\n","from datasets import load_dataset\n","from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n","\n","model_name = \"google/flan-t5-large\"\n","\n","model = AutoModelForSeq2SeqLM.from_pretrained(model_name, load_in_8bit=True)\n","tokenizer = AutoTokenizer.from_pretrained(model_name)"]},{"cell_type":"markdown","metadata":{"id":"VwcHieQzD_dl"},"source":["## Prepare model for training"]},{"cell_type":"markdown","metadata":{"id":"4o3ePxrjEDzv"},"source":["Some pre-processing needs to be done before training such an int8 model using `peft`, therefore let's import an utiliy function `prepare_model_for_int8_training` that will:\n","- Casts all the non `int8` modules to full precision (`fp32`) for stability\n","- Add a `forward_hook` to the input embedding layer to enable gradient computation of the input hidden states\n","- Enable gradient checkpointing for more memory-efficient training"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"1629ebcb"},"outputs":[],"source":["from peft import prepare_model_for_int8_training\n","\n","model = prepare_model_for_int8_training(model)"]},{"cell_type":"markdown","metadata":{"id":"iCpAgawAEieu"},"source":["## Load your `PeftModel`\n","\n","Here we will use LoRA (Low-Rank Adaptators) to train our model"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"17566ae3"},"outputs":[],"source":["from peft import LoraConfig, get_peft_model, TaskType\n","\n","\n","def print_trainable_parameters(model):\n"," \"\"\"\n"," Prints the number of trainable parameters in the model.\n"," \"\"\"\n"," trainable_params = 0\n"," all_param = 0\n"," for _, param in model.named_parameters():\n"," all_param += param.numel()\n"," if param.requires_grad:\n"," trainable_params += param.numel()\n"," print(\n"," f\"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}\"\n"," )\n","\n","\n","lora_config = LoraConfig(\n"," r=16, lora_alpha=32, target_modules=[\"q\", \"v\"], lora_dropout=0.05, bias=\"none\", task_type=\"SEQ_2_SEQ_LM\"\n",")\n","\n","\n","model = get_peft_model(model, lora_config)\n","print_trainable_parameters(model)"]},{"cell_type":"markdown","metadata":{"id":"mGkwIgNXyS7U"},"source":["As you can see, here we are only training 0.6% of the parameters of the model! This is a huge memory gain that will enable us to fine-tune the model without any memory issue."]},{"cell_type":"markdown","metadata":{"id":"9kkyrzsakn2b"},"source":["## TODO #2\n","\n","μœ„μ™€ 같이 0.6%둜 ν•™μŠ΅ νŒŒλΌλ―Έν„°μ˜ κ°―μˆ˜κ°€ λŒ€ν­ μΆ•μ†Œλœ 원리에 λŒ€ν•΄ 개랡적으둜 μ‘°μ‚¬ν•˜μ‹œμ˜€.\n","- νŒ¨ν‚Ήμ„ μ‚¬μš©ν•˜μ—¬ μ—¬λŸ¬ ν›ˆλ ¨ 예제λ₯Ό 단일 μ‹œν€€μŠ€λ‘œ κ²°ν•©ν•˜κ³  μ‹œν€€μŠ€ μ’…λ£Œ 토큰을 μ‚¬μš©ν•˜μ—¬ μž…λ ₯을 λŒ€μƒμ—μ„œ 뢄리함.\n","- λ§ˆμŠ€ν‚Ήμ€ 토큰이 νŒ¨ν‚Ήλœ 예제 경계λ₯Ό λ„˜μ–΄ λ‹€λ₯Έ ν† ν°μ—κ²Œ μ „λ‹¬λ˜λŠ” 것을 λ°©μ§€ν•˜κΈ° μœ„ν•΄ 적용됨."]},{"cell_type":"markdown","metadata":{"id":"wgvqtHnFlNAl"},"source":["## TODO #3\n","\n","λͺ¨λΈ λ‘œλ“œμ‹œ `load_in_8bit=True` μ˜΅μ…˜μ„ μ‚¬μš©ν•˜μ§€ μ•ŠμœΌλ©΄ 원본을 λ‘œλ”©ν•œλ‹€.\n","\n","이 λ•Œμ˜ λͺ¨λΈ ꡬ쑰와, `load_in_8bit=True` 을 μ‚¬μš©ν–ˆμ„ λ•Œμ˜ λͺ¨λΈ ꡬ쑰λ₯Ό λΉ„κ΅ν•˜μ—¬ μ–΄λ–€ 차이점이 μžˆλŠ”μ§€λ₯Ό μ‘°μ‚¬ν•˜μ‹œμ˜€.\n","- 기본적으둜 λͺ¨λΈμ„ 32λΉ„νŠΈ μ •λ°€λ„λ‘œ λ‘œλ“œν•¨.\n","- `load_in_8bit=True`: λͺ¨λΈμ„ 8λΉ„νŠΈ μ •λ°€λ„λ‘œ λ‘œλ“œν•¨. -\u003e 빠름, λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰ 감, 정확도 손싀 λ°œμƒν•  수 있음."]},{"cell_type":"markdown","metadata":{"id":"HsG0x6Z7FwjZ"},"source":["## Load and process data\n","\n","Here we will use [`financial_phrasebank`](https://huggingface.co/datasets/financial_phrasebank) dataset to fine-tune our model on sentiment classification on financial sentences. We will load the split `sentences_allagree`, which corresponds according to the model card to the split where there is a 100% annotator agreement."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"242cdfae"},"outputs":[],"source":["# loading dataset\n","dataset = load_dataset(\"financial_phrasebank\", \"sentences_allagree\")\n","dataset = dataset[\"train\"].train_test_split(test_size=0.1)\n","dataset[\"validation\"] = dataset[\"test\"]\n","del dataset[\"test\"]\n","\n","classes = dataset[\"train\"].features[\"label\"].names\n","dataset = dataset.map(\n"," lambda x: {\"text_label\": [classes[label] for label in x[\"label\"]]},\n"," batched=True,\n"," num_proc=1,\n",")"]},{"cell_type":"markdown","metadata":{"id":"qzwyi-Z9yzRF"},"source":["Let's also apply some pre-processing of the input data, the labels needs to be pre-processed, the tokens corresponding to `pad_token_id` needs to be set to `-100` so that the `CrossEntropy` loss associated with the model will correctly ignore these tokens."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"6b7ea44c"},"outputs":[],"source":["# data preprocessing\n","text_column = \"sentence\"\n","label_column = \"text_label\"\n","max_length = 128\n","\n","\n","def preprocess_function(examples):\n"," inputs = examples[text_column]\n"," targets = examples[label_column]\n"," model_inputs = tokenizer(inputs, max_length=max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\")\n"," labels = tokenizer(targets, max_length=3, padding=\"max_length\", truncation=True, return_tensors=\"pt\")\n"," labels = labels[\"input_ids\"]\n"," labels[labels == tokenizer.pad_token_id] = -100\n"," model_inputs[\"labels\"] = labels\n"," return model_inputs\n","\n","\n","processed_datasets = dataset.map(\n"," preprocess_function,\n"," batched=True,\n"," num_proc=1,\n"," remove_columns=dataset[\"train\"].column_names,\n"," load_from_cache_file=False,\n"," desc=\"Running tokenizer on dataset\",\n",")\n","\n","train_dataset = processed_datasets[\"train\"]\n","eval_dataset = processed_datasets[\"validation\"]"]},{"cell_type":"markdown","metadata":{"id":"zmh21tjCm01z"},"source":["## TODO #4\n","\n","μœ„ 데이터셋 λ‘œλ”©/κ°€κ³΅μ—μ„œ μ‚¬μš©ν•œ ν—ˆλΈŒμ˜ 데이터셋 `financial_phrasebank` ꡬ쑰와 이 셋이 μ–΄λ–»κ²Œ λ―Έμ„ΈνŠœλ‹μ— ν™œμš©λ˜μ—ˆλŠ”μ§€ 개랡적으둜 μ‘°μ‚¬ν•˜μ‹œμ˜€.\n","- financial_phrasebank: 금육 λ‰΄μŠ€ λ¬Έμž₯의 감정 데이터셋 -\u003e 긍정, λΆ€μ •, 쀑립\n","- 90% ν›ˆλ ¨ 데이터, 10% 검증 데이터\n","- λ°μ΄ν„°μ…‹μ˜ λ ˆμ΄λΈ”μ„ λ³€ν™˜: \"text_label\"μ΄λΌλŠ” μƒˆλ‘œμš΄ ν•„λ“œλ₯Ό μΆ”κ°€ν•˜κ³ , 각 데이터 포인트의 \"label\" 값을 ν•΄λ‹Ή 클래슀 μ΄λ¦„μœΌλ‘œ λ³€ν™˜"]},{"cell_type":"markdown","metadata":{"id":"bcNTdVypGEPb"},"source":["## Train our model!\n","\n","Let's now train our model, run the cells below.\n","Note that for T5 since some layers are kept in `float32` for stability purposes there is no need to call autocast on the trainer."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"69c756ac"},"outputs":[],"source":["from transformers import TrainingArguments, Trainer\n","\n","training_args = TrainingArguments(\n"," \"temp\",\n"," evaluation_strategy=\"epoch\",\n"," learning_rate=1e-3,\n"," gradient_accumulation_steps=1,\n"," auto_find_batch_size=True,\n"," num_train_epochs=1,\n"," save_steps=100,\n"," save_total_limit=8,\n",")\n","trainer = Trainer(\n"," model=model,\n"," args=training_args,\n"," train_dataset=train_dataset,\n"," eval_dataset=eval_dataset,\n",")\n","model.config.use_cache = False # silence the warnings. Please re-enable for inference!"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ab52b651"},"outputs":[],"source":["trainer.train()"]},{"cell_type":"markdown","metadata":{"id":"r98VtofiGXtO"},"source":["## Qualitatively test our model"]},{"cell_type":"markdown","metadata":{"id":"NIm7z3UNzGPP"},"source":["Let's have a quick qualitative evaluation of the model, by taking a sample from the dataset that corresponds to a positive label. Run your generation similarly as you were running your model from `transformers`:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"c95d6173"},"outputs":[],"source":["model.eval()\n","input_text = \"In January-September 2009 , the Group 's net interest income increased to EUR 112.4 mn from EUR 74.3 mn in January-September 2008 .\"\n","inputs = tokenizer(input_text, return_tensors=\"pt\")\n","\n","outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n","\n","print(\"input sentence: \", input_text)\n","print(\" output prediction: \", tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))"]},{"cell_type":"markdown","metadata":{"id":"ubwn2Qdbl3Fb"},"source":["## TODO #5\n","\n","본인의 ν—ˆκΉ…νŽ˜μ΄μŠ€ 계정을 λ§Œλ“€κ³  μ•„λž˜ ν—ˆλΈŒμ— μ—…λ‘œλ“œ/λ‹€μš΄λ‘œλ“œ/확인 과정을 본인 계정 κΈ°μ€€μœΌλ‘œ μ§„ν–‰ν•˜μ‹œμ˜€.\n","\n","진행 ν›„ μ—…λ₯΄λ“œν•œ ν—ˆκΉ…νŽ˜μ΄μŠ€ ν—ˆλΈŒμ˜ λͺ¨λΈ idλ₯Ό μ μœΌμ‹œμ˜€.\n","- λ§ˆν¬λ‹€μš΄ μŠ€νƒ€μΌλ‘œ μž‘μ„±ν•˜μ‹œμ˜€."]},{"cell_type":"markdown","metadata":{"id":"9QqBlwzoGZ3f"},"source":["## Share your adapters on πŸ€— Hub"]},{"cell_type":"markdown","metadata":{"id":"NT-C8SjcKqUx"},"source":["Once you have trained your adapter, you can easily share it on the Hub using the method `push_to_hub` . Note that only the adapter weights and config will be pushed"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"bcbfa1f9"},"outputs":[],"source":["from huggingface_hub import notebook_login\n","\n","notebook_login(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)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"rFKJ4vHNGkJw"},"outputs":[],"source":["model.push_to_hub(\"ybelkada/flan-t5-large-financial-phrasebank-lora\", use_auth_token=True)"]},{"cell_type":"markdown","metadata":{"id":"xHuDmbCYJ89f"},"source":["## Load your adapter from the Hub"]},{"cell_type":"markdown","metadata":{"id":"ANFo6DdfKlU3"},"source":["You can load the model together with the adapter with few lines of code! Check the snippet below to load the adapter from the Hub and run the example evaluation!"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"j097aaPWJ-9u"},"outputs":[],"source":["import torch\n","from peft import PeftModel, PeftConfig\n","from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n","\n","peft_model_id = \"ybelkada/flan-t5-large-financial-phrasebank-lora\"\n","config = PeftConfig.from_pretrained(peft_model_id)\n","\n","model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, torch_dtype=\"auto\", device_map=\"auto\")\n","tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)\n","\n","# Load the Lora model\n","model = PeftModel.from_pretrained(model, peft_model_id)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"jmjwWYt0KI_I"},"outputs":[],"source":["model.eval()\n","input_text = \"In January-September 2009 , the Group 's net interest income increased to EUR 112.4 mn from EUR 74.3 mn in January-September 2008 .\"\n","inputs = tokenizer(input_text, return_tensors=\"pt\")\n","\n","outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n","\n","print(\"input sentence: \", input_text)\n","print(\" output prediction: \", tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))"]}],"metadata":{"accelerator":"GPU","colab":{"name":"","provenance":[{"file_id":"1-1-LIlaEF8ENrJfcID6S1p7ZJy1Ur1LY","timestamp":1695805141441}],"version":""},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.11"},"vscode":{"interpreter":{"hash":"1219a10c7def3e2ad4f431cfa6f49d569fcc5949850132f23800e792129eefbb"}}},"nbformat":4,"nbformat_minor":5}