openai/whisper · How to estimate the GPU memory needed to finetune whisper large model?

I recently use AdaLora to finetune whisper-large-v3. Base on my understanding, the most meomory-consumed parts are model parameter, gradient, and optimizer.
For full layer finetune, the estimate gpu memory needed is around 14.44GB. However, when i use PEFT, the meomory usage is far above this number. I have per_device_train_batch_size=1, gradient_accumulation_steps =2, num_workers=1. What are some important factors missing in the estimation?

Half precision (FP16): ~ 2.89GB 1550M * 2 bytes 
AdamW: ~ 8.66GB (3 copies of parameter)
Gradient: ~ 2.89GB