Making ProtGPT2-medium and ProtGPT2-small available?

#25

by littleworth - opened May 4, 2023

Discussion

littleworth

May 4, 2023

•

edited May 4, 2023

@nferruz Hi Noelia,

Will consider also having ProtGPT2-medium and ProtGPT2-small, please?
This will be of great help to people who want to debug or don't have large-capacity GPU machines.

Currently with this parameter running on AWS p3.16xlarge, the program crash with a CUDA memory error.

AWS EC2 p3.16xlarge instance type is powered by 8 NVIDIA Tesla V100 GPUs, each with 16 GB of GPU memory.
In total, the p3.16xlarge instance provides 128 GB of GPU memory

Do you have any suggest what parameter I can use to avoid that?

TRAINING_FILE="data/ha_filtered_108k.train.gpt2_format.txt"   # 80K lines
VALIDATION_FILE="data/ha_filtered_108k.validation.gpt2_format.txt" # 20K lines
MODEL_OUTPUT_DIR="gpt2_model/ha_filtered_108k"

python run_clm.py --model_name_or_path nferruz/ProtGPT2 \
    --train_file ${TRAINING_FILE} \
    --validation_file ${VALIDATION_FILE} \
    --tokenizer_name nferruz/ProtGPT2 \
    --do_train \
    --do_eval  \
    --output_dir ${MODEL_OUTPUT_DIR} \
    --overwrite_output_dir \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps=16 \
    --fp16 \
    --learning_rate 1e-06

Sincerely,
Littleworth

littleworth changed discussion title from ProtGPT2-medium and ProtGPT2-small to Making ProtGPT2-medium and ProtGPT2-small available? May 4, 2023

nferruz

Owner May 4, 2023

Hi Littleworth,

I never trained a small or medium version, I am afraid. I trained another model but it is even bigger.

Sorry I don't have better news for now!

littleworth

May 4, 2023

@nferruz Hi Noelia,

Thanks. I finally managed to get it running with the help of DeepSpeed.
Here is the full code:

#!/bin/bash
export LC_ALL=C

TRAINING_FILE="data/ha_filtered_108k.train.gpt2_format.txt"
VALIDATION_FILE="data/ha_filtered_108k.validation.gpt2_format.txt"
MODEL_OUTPUT_DIR="gpt2_model/ha_filtered_108k"
DS_CONFIG_FILE="ds_config.json"


/home/ubuntu/storage1/conda_envs/py38/bin/deepspeed --num_gpus=8 run_clm.py --model_name_or_path nferruz/ProtGPT2 \
    --train_file ${TRAINING_FILE} \
    --validation_file ${VALIDATION_FILE} \
    --tokenizer_name nferruz/ProtGPT2 \
    --do_train \
    --do_eval  \
    --output_dir ${MODEL_OUTPUT_DIR} \
    --overwrite_output_dir \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps=16 \
    --fp16 \
    --learning_rate 1e-06 \
    --deepspeed ${DS_CONFIG_FILE}

And ds_config.json file content is:

{
    "fp16": {
        "enabled": true
    },
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 2e8,
        "contiguous_gradients": true
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": 1e-6,
            "betas": [
                0.9,
                0.999
            ],
            "eps": 1e-8,
            "weight_decay": 0
        }
    },
    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": 0,
            "warmup_max_lr": 1e-6,
            "warmup_num_steps": "auto"
        }
    },
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 16,
    "gradient_clipping": 1.0
}

Everything is completed in less than 10 minutes with p3.16xlarge.
Hope this information will help others.

Regards,
littleworth

deleted

Sep 29, 2023

•

edited Sep 29, 2023

hi, thank you for sharing all these tricks. may i ask are you still using the 8x v100 GPUs with 16gb in this case with DeepSpeed? Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment