--- library_name: transformers tags: [] --- --- # Fine-Tuning LLaMA-2-7b with QLoRA on Custom Dataset This repository provides a setup and script for fine-tuning the LLaMA-2-7b model using QLoRA (Quantized Low-Rank Adaptation) with custom datasets. The script is designed for efficiency and flexibility in training large language models (LLMs) by leveraging advanced techniques such as 4-bit quantization and LoRA. ## Overview The script fine-tunes a pre-trained LLaMA-2-7b model using a custom dataset, applying QLoRA techniques to optimize performance. It utilizes the `transformers`, `datasets`, `peft`, and `trl` libraries for model management, data processing, and training. The setup includes support for mixed precision training, gradient checkpointing, and advanced quantization techniques to enhance the efficiency of the fine-tuning process. ## Components ### 1. Dependencies Ensure the following libraries are installed: - `torch` - `datasets` - `transformers` - `peft` - `trl` Install them using pip if they are not already available: ```bash pip install torch datasets transformers peft trl ``` ### 2. Model and Dataset - **Model**: The base model used is `LLaMA-2-7b`. The script loads this model from a specified local directory. - **Dataset**: The training data is loaded from a specified directory. The dataset should be formatted in a way that the `"text"` field contains the training examples. ### 3. QLoRA Configuration QLoRA parameters are used to configure the quantization and adaptation process: - **LoRA Attention Dimension (`lora_r`)**: 64 - **LoRA Alpha Parameter (`lora_alpha`)**: 16 - **LoRA Dropout Probability (`lora_dropout`)**: 0.1 ### 4. BitsAndBytes Configuration Quantization settings for the model: - **Use 4-bit Precision**: True - **Compute Data Type**: `float16` - **Quantization Type**: `nf4` - **Nested Quantization**: False ### 5. Training Configuration Training parameters are defined as follows: - **Output Directory**: `./results` - **Number of Epochs**: 300 - **Batch Size**: 4 - **Gradient Accumulation Steps**: 1 - **Learning Rate**: 2e-4 - **Weight Decay**: 0.001 - **Optimizer**: `paged_adamw_32bit` - **Learning Rate Scheduler**: `cosine` - **Gradient Clipping**: 0.3 - **Warmup Ratio**: 0.03 - **Logging Steps**: 25 - **Save Steps**: 0 ### 6. Training and Evaluation The script includes preprocessing of the dataset, model initialization with QLoRA, and training using `SFTTrainer` from the `trl` library. It supports mixed precision training and gradient checkpointing to enhance training efficiency. ### 7. Usage Instructions 1. **Update File Paths**: Adjust `model_name`, `dataset_name`, and `new_model` paths according to your environment. 2. **Run the Script**: Execute the script in your Python environment to start the fine-tuning process. ```bash python fine_tune_llama.py ``` 3. **Monitor Training**: Use TensorBoard or similar tools to monitor the training progress. ### 8. Model Saving After training, the model is saved to the specified directory (`new_model`). This trained model can be loaded for further evaluation or deployment. ## Example Configuration Here’s an example configuration used for fine-tuning: __hint__: the base model is: NousResearch/Llama-2-7b-chat-hf __hint__: the dataset is: mlabonne/guanaco-llama2-1k __hint__: I saved them on my local machine then laod them! you can directly download them from huggingface ```python model_name = "/data/bio-eng-llm/llm_repo/NousResearch/Llama-2-7b-chat-hf" # the base model is: NousResearch/Llama-2-7b-chat-hf dataset_name = "/data/bio-eng-llm/llm_repo/mlabonne/guanaco-llama2-1k" # the dataset is: mlabonne/guanaco-llama2-1k new_model = "/data/bio-eng-llm/llm_repo/mlabonne/llama-2-7b-miniguanaco" lora_r = 64 lora_alpha = 16 lora_dropout = 0.1 use_4bit = True bnb_4bit_compute_dtype = "float16" bnb_4bit_quant_type = "nf4" use_nested_quant = False output_dir = "./results" num_train_epochs = 300 fp16 = False bf16 = False per_device_train_batch_size = 4 gradient_accumulation_steps = 1 gradient_checkpointing = True max_grad_norm = 0.3 learning_rate = 2e-4 weight_decay = 0.001 optim = "paged_adamw_32bit" lr_scheduler_type = "cosine" max_steps = -1 warmup_ratio = 0.03 group_by_length = True save_steps = 0 logging_steps = 25 ``` # The entire Python training module: ```python import os import torch from datasets import load_dataset from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, HfArgumentParser, TrainingArguments, pipeline, logging, ) from peft import LoraConfig, PeftModel from trl import SFTTrainer import sys import os cwd = os.getcwd() # sys.path.append(cwd + '/my_directory') sys.path.append(cwd) def setting_directory(depth): current_dir = os.path.abspath(os.getcwd()) root_dir = current_dir for i in range(depth): root_dir = os.path.abspath(os.path.join(root_dir, os.pardir)) sys.path.append(os.path.dirname(root_dir)) return root_dir ################################# #S:\Llavar_repo\LLaVA\NousResearch\Llama-2-7b-chat-hf # The model that you want to train from the Hugging Face hub model_name = "/data/bio-eng-llm/llm_repo/NousResearch/Llama-2-7b-chat-hf" #model_name = setting_directory(2) + "\\Llavar_repo\\LLaVA\NousResearch\\Llama-2-7b-chat-hf" # The instruction dataset to use dataset_name = "/data/bio-eng-llm/llm_repo/mlabonne/guanaco-llama2-1k" # Fine-tuned model name new_model = "/data/bio-eng-llm/llm_repo/mlabonne/llama-2-7b-miniguanaco" ################################################################################ # QLoRA parameters ################################################################################ # LoRA attention dimension lora_r = 64 # Alpha parameter for LoRA scaling lora_alpha = 16 # Dropout probability for LoRA layers lora_dropout = 0.1 ################################################################################ # bitsandbytes parameters ################################################################################ # Activate 4-bit precision base model loading use_4bit = True # Compute dtype for 4-bit base models bnb_4bit_compute_dtype = "float16" # Quantization type (fp4 or nf4) bnb_4bit_quant_type = "nf4" # Activate nested quantization for 4-bit base models (double quantization) use_nested_quant = False ################################################################################ # TrainingArguments parameters ################################################################################ # Output directory where the model predictions and checkpoints will be stored output_dir = "./results" # Number of training epochs num_train_epochs = 300 # Enable fp16/bf16 training (set bf16 to True with an A100) fp16 = False bf16 = False # Batch size per GPU for training per_device_train_batch_size = 4 # Batch size per GPU for evaluation per_device_eval_batch_size = 4 # Number of update steps to accumulate the gradients for gradient_accumulation_steps = 1 # Enable gradient checkpointing gradient_checkpointing = True # Maximum gradient normal (gradient clipping) max_grad_norm = 0.3 # Initial learning rate (AdamW optimizer) learning_rate = 2e-4 # Weight decay to apply to all layers except bias/LayerNorm weights weight_decay = 0.001 # Optimizer to use optim = "paged_adamw_32bit" # Learning rate schedule lr_scheduler_type = "cosine" # Number of training steps (overrides num_train_epochs) max_steps = -1 # Ratio of steps for a linear warmup (from 0 to learning rate) warmup_ratio = 0.03 # Group sequences into batches with same length # Saves memory and speeds up training considerably group_by_length = True # Save checkpoint every X updates steps save_steps = 0 # Log every X updates steps logging_steps = 25 ################################################################################ # SFT parameters ################################################################################ # Maximum sequence length to use max_seq_length = None # Pack multiple short examples in the same input sequence to increase efficiency packing = False # Load the entire model on the GPU 0 device_map = {"": 0} ################################################################################ # Load dataset (you can process it here) dataset = load_dataset(dataset_name, split="train") print(dataset[0].keys()) # This will print all the field names in your dataset # Load tokenizer and model with QLoRA configuration compute_dtype = getattr(torch, bnb_4bit_compute_dtype) bnb_config = BitsAndBytesConfig( load_in_4bit=use_4bit, bnb_4bit_quant_type=bnb_4bit_quant_type, bnb_4bit_compute_dtype=compute_dtype, bnb_4bit_use_double_quant=use_nested_quant, ) # Check GPU compatibility with bfloat16 if compute_dtype == torch.float16 and use_4bit: major, _ = torch.cuda.get_device_capability() if major >= 8: print("=" * 80) print("Your GPU supports bfloat16: accelerate training with bf16=True") print("=" * 80) # Load base model model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map=device_map ) model.config.use_cache = False model.config.pretraining_tp = 1 # Load LLaMA tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training # Load LoRA configuration peft_config = LoraConfig( lora_alpha=lora_alpha, lora_dropout=lora_dropout, r=lora_r, bias="none", task_type="CAUSAL_LM", ) # Set training parameters training_arguments = TrainingArguments( output_dir=output_dir, num_train_epochs=num_train_epochs, per_device_train_batch_size=per_device_train_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, optim=optim, save_steps=save_steps, logging_steps=logging_steps, learning_rate=learning_rate, weight_decay=weight_decay, fp16=fp16, bf16=bf16, max_grad_norm=max_grad_norm, max_steps=max_steps, warmup_ratio=warmup_ratio, group_by_length=group_by_length, lr_scheduler_type=lr_scheduler_type, report_to="tensorboard" ) # Set supervised fine-tuning parameters def preprocess_function(examples): return tokenizer(examples["text"], truncation=True, max_length=512) tokenized_dataset = dataset.map(preprocess_function, batched=True) trainer = SFTTrainer( model=model, train_dataset=tokenized_dataset, peft_config=peft_config, tokenizer=tokenizer, args=training_arguments, packing=packing, ) # Train model trainer.train() # Save trained model trainer.model.save_pretrained(new_model) ``` # Testing the fine tuned model on the dataset ```python import os import torch from datasets import load_dataset from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, HfArgumentParser, TrainingArguments, pipeline, logging, ) from peft import LoraConfig, PeftModel from trl import SFTTrainer import sys import os base_model_name = "/data/bio-eng-llm/llm_repo/NousResearch/Llama-2-7b-chat-hf" # The base model you fine-tuned cwd = os.getcwd() # sys.path.append(cwd + '/my_directory') sys.path.append(cwd) def setting_directory(depth): current_dir = os.path.abspath(os.getcwd()) root_dir = current_dir for i in range(depth): root_dir = os.path.abspath(os.path.join(root_dir, os.pardir)) sys.path.append(os.path.dirname(root_dir)) return root_dir # The instruction dataset to use dataset_name = "/data/bio-eng-llm/llm_repo/mlabonne/guanaco-llama2-1k" # Fine-tuned model name new_model = "/data/bio-eng-llm/llm_repo/mlabonne/llama-2-7b-miniguanaco" ############################ ############################ Loading the fine tunned model from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Base model path (you've trained this model using PEFT) base_model_name = "NousResearch/Llama-2-7b-chat-hf" # Load the base model and tokenizer model = AutoModelForCausalLM.from_pretrained(base_model_name) tokenizer = AutoTokenizer.from_pretrained(base_model_name) # Path to the directory containing adapter_config.json and adapter_model.safetensors fine_tuned_model_path = "/data/bio-eng-llm/llm_repo/mlabonne/llama-2-7b-miniguanaco" # Load the fine-tuned model (PEFT adapter) model = PeftModel.from_pretrained(model, fine_tuned_model_path) print(model) #################################################### #################################################### #################################################### import os import torch from datasets import load_dataset from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import json # Define paths base_model_name = "/data/bio-eng-llm/llm_repo/NousResearch/Llama-2-7b-chat-hf" fine_tuned_model_path = "/data/bio-eng-llm/llm_repo/mlabonne/llama-2-7b-miniguanaco" dataset_name = "/data/bio-eng-llm/llm_repo/mlabonne/guanaco-llama2-1k" # Load the dataset dataset = load_dataset(dataset_name, split="train") # Initialize the tokenizer and load the base model tokenizer = AutoTokenizer.from_pretrained(base_model_name) base_model = AutoModelForCausalLM.from_pretrained(base_model_name) model = PeftModel.from_pretrained(base_model, fine_tuned_model_path) # Set the model to evaluation mode model.eval() # Define a function to evaluate the model on a small portion of the dataset def evaluate_model(dataset, tokenizer, model, sample_size=10, max_length=512, max_new_tokens=50): # Select a small portion of the dataset subset = dataset.select(range(min(sample_size, len(dataset)))) results = [] for example in subset: # Tokenize the input inputs = tokenizer(example['text'], return_tensors="pt", truncation=True, padding='max_length', max_length=max_length) # Ensure no gradients are calculated during inference with torch.no_grad(): # Generate responses outputs = model.generate( input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'], max_length=max_length + max_new_tokens, # Adjust max_length to allow for new tokens max_new_tokens=max_new_tokens # Allow generating up to `max_new_tokens` ) # Decode the output generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) # Append result results.append({ 'input_text': example['text'], 'generated_text': generated_text }) return results # Evaluate the model on a small portion of the dataset (e.g., 10 samples) evaluation_results = evaluate_model(dataset, tokenizer, model, sample_size=10) # Print a few results for result in evaluation_results: # Print the results print(f"Input Text: {result['input_text']}") print(f"Generated Text: {result['generated_text']}") print("-" * 50) # Optionally, save results to a file with open('evaluation_results.json', 'w') as f: json.dump(evaluation_results, f, indent=4) ``` # Pushing the model to Huggingface __hint__: I saved everything on my local machine then I pushed it into huggingface! __hint__: You need "Your-Huggingface-ID" and "Your-Huggingface-Token" ```python import os from transformers import AutoModelForCausalLM, AutoTokenizer, logging from huggingface_hub import HfApi, Repository, login from peft import LoraConfig, PeftModel # Define paths base_model_name = "/data/bio-eng-llm/llm_repo/NousResearch/Llama-2-7b-chat-hf" fine_tuned_model_path = "/data/bio-eng-llm/llm_repo/mlabonne/llama-2-7b-miniguanaco" save_directory = "./fine_tuned_model" # Local directory to save the model repo_name = "Your-Huggingface-ID/llama-2-7b-miniguanaco" # Replace with your Hugging Face username and model repo name # Login to Hugging Face # Step 1: Log in to Hugging Face print("Logging in to Hugging Face...") login(token="Your-Huggingface-Token") # Step 2: Load the tokenizer and model print("Loading base model and fine-tuned adapters...") tokenizer = AutoTokenizer.from_pretrained(base_model_name) base_model = AutoModelForCausalLM.from_pretrained(base_model_name) model = PeftModel.from_pretrained(base_model, fine_tuned_model_path) # Step 3: Save the tokenizer and the fine-tuned model print(f"Saving the fine-tuned model to {save_directory}...") os.makedirs(save_directory, exist_ok=True) tokenizer.save_pretrained(save_directory) model.save_pretrained(save_directory) # Step 4: Push the model to Hugging Face Hub print(f"Pushing the model to the Hugging Face Hub: {repo_name}...") model.push_to_hub(repo_name) tokenizer.push_to_hub(repo_name) print("Model pushed successfully!") ``` ## Log file after pushing: ```bash Logging in to Hugging Face... The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well. Token is valid (permission: fineGrained). Your token has been saved to /home/forootan/.cache/huggingface/token Login successful Loading base model and fine-tuned adapters... Loading checkpoint shards: 0%| | 0/2 [00:00