Fine-Tuning LLaMA-2 Chat Model with Medical QnA Dataset using QLoRA

This repository contains the code and configuration for fine-tuning the LLaMA-2 chat model using the Medical QnA dataset with the QLoRA technique.Used only 2k data elements for training due to constrained gpu resources.

Model and Dataset

Pre-trained Model: NousResearch/Llama-2-7b-chat-hf
Dataset for Fine-Tuning: randomani/MedicalQnA-llama2
Fine-Tuned Model Name: Llama-2-7b-Medchat-finetune

QLoRA Parameters

LoRA Attention Dimension (lora_r): 64
LoRA Scaling Alpha (lora_alpha): 16
LoRA Dropout Probability (lora_dropout): 0.1

bitsandbytes Parameters

Use 4-bit Precision (use_4bit): True
4-bit Compute Dtype (bnb_4bit_compute_dtype): float16
4-bit Quantization Type (bnb_4bit_quant_type): nf4
Use Nested Quantization (use_nested_quant): False

Training Arguments

Number of Training Epochs (num_train_epochs): 1
Use fp16 (fp16): False
Use bf16 (bf16): False
Training Batch Size per GPU (per_device_train_batch_size): 4
Evaluation Batch Size per GPU (per_device_eval_batch_size): 4
Gradient Accumulation Steps (gradient_accumulation_steps): 1
Enable Gradient Checkpointing (gradient_checkpointing): True
Maximum Gradient Norm (max_grad_norm): 0.3
Initial Learning Rate (learning_rate): 2e-4
Weight Decay (weight_decay): 0.001
Optimizer (optim): paged_adamw_32bit
Learning Rate Scheduler Type (lr_scheduler_type): cosine
Maximum Training Steps (max_steps): -1
Warmup Ratio (warmup_ratio): 0.03
Group Sequences by Length (group_by_length): True
Save Checkpoints Every X Steps (save_steps): 0
Logging Steps (logging_steps): 25

Supervised Fine-Tuning (SFT) Parameters

Maximum Sequence Length (max_seq_length): None
Packing Multiple Short Examples (packing): False

References

For more details and access to the dataset, visit the Hugging Face Dataset Page.

randomani
/

Llama-2-7b-chat-Medchat-finetune