PEFT
Safetensors
krx
South-Korea-Shuan's picture
Update README.md
96e08bb verified
|
raw
history blame
3.73 kB
metadata
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: peft
datasets:
  - gbharti/finance-alpaca
  - sujet-ai/Sujet-Finance-Instruct-177k
tags:
  - krx

Qwen 2.5 7B Instruct Model Fine-tuning

This repository contains code for fine-tuning the Qwen 2.5 7B Instruct model using Amazon SageMaker. The project uses QLoRA (Quantized Low-Rank Adaptation) for efficient fine-tuning of large language models.

Project Structure

.
├── scripts/
│   ├── train.py
│   ├── tokenization_qwen2.py
│   ├── requirements.txt
│   └── bootstrap.sh
├── sagemaker_train.py
└── README.md

Prerequisites

  • Amazon SageMaker access
  • Hugging Face account and access token
  • AWS credentials configured
  • Python 3.10+

Environment Setup

The project uses the following key dependencies:

  • PyTorch 2.1.0
  • Transformers (latest from main branch)
  • Accelerate >= 0.27.0
  • PEFT >= 0.6.0
  • BitsAndBytes >= 0.41.0

Model Configuration

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Training Method: QLoRA (4-bit quantization)
  • Instance Type: ml.p5.48xlarge
  • Distribution Strategy: PyTorch DDP

Training Configuration

Hyperparameters

{
    'epochs': 3,
    'per_device_train_batch_size': 4,
    'gradient_accumulation_steps': 8,
    'learning_rate': 1e-5,
    'max_steps': 1000,
    'bf16': True,
    'max_length': 2048,
    'gradient_checkpointing': True,
    'optim': 'adamw_torch',
    'lr_scheduler_type': 'cosine',
    'warmup_ratio': 0.1,
    'weight_decay': 0.01,
    'max_grad_norm': 0.3
}

Environment Variables

The training environment is configured with optimizations for distributed training and memory management:

  • CUDA device configuration
  • Memory optimization settings
  • EFA (Elastic Fabric Adapter) configuration for distributed training
  • Hugging Face token and cache settings

Training Process

  1. Environment Preparation:

    • Creates requirements.txt with necessary dependencies
    • Generates bootstrap.sh for Transformers installation
    • Sets up SageMaker training configuration
  2. Model Loading:

    • Loads the base Qwen 2.5 7B model with 4-bit quantization
    • Configures BitsAndBytes for quantization
    • Prepares model for k-bit training
  3. Dataset Processing:

    • Uses the Sujet Finance dataset
    • Formats conversations in Qwen2 format
    • Applies tokenization with maximum length of 2048 tokens
    • Implements data preprocessing with parallel processing
  4. Training:

    • Implements gradient checkpointing for memory efficiency
    • Uses cosine learning rate schedule with warmup
    • Saves checkpoints every 50 steps
    • Logs training metrics every 10 steps

Monitoring and Metrics

The training process tracks the following metrics:

  • Training loss
  • Evaluation loss

Error Handling

The implementation includes comprehensive error handling and logging:

  • Environment validation
  • Dataset preparation verification
  • Training process monitoring
  • Detailed error messages and stack traces

Usage

  1. Configure AWS credentials and SageMaker role
  2. Set up Hugging Face token
  3. Run the training script:
python sagemaker_train.py

Custom Components

Custom Tokenizer

The project includes a custom implementation of the Qwen2 tokenizer (tokenization_qwen2.py) with:

  • Special token handling
  • Unicode normalization
  • Vocabulary management
  • Input preparation for model training

Notes

  • The training script is optimized for the ml.p5.48xlarge instance type
  • Uses PyTorch Distributed Data Parallel for training
  • Implements gradient checkpointing for memory optimization
  • Includes automatic retry mechanism for training failures

License

[Add License Information]