20231102-20_epochs_layoutlmv2-base-uncased_finetuned_docvqa

This model was trained from scratch on the 1.2 Example dataset released by DocVQA. It achieves the following results on the evaluation set:

  • Loss: 2.9087

Model description

This DocVQA model, built on the Layout LM v2 framework, represents an initial step in a series of experimental models aimed at document visual question answering. It's the "mini" version in a planned series, trained on a relatively small dataset of 1.2k samples (1,000 for training and 200 for testing) over 20 epochs. The training setup was modest, employing mixed precision (fp16), with manageable batch sizes and a focused approach to learning rate adjustment (warmup steps and weight decay). Notably, this model was trained without external reporting tools, emphasizing internal evaluation. As the first iteration in a progressive series that will later include medium (5k samples) and large (50k samples) models, this version serves as a foundational experiment, setting the stage for more extensive and complex models in the future.

Intended uses & limitations

Experimental Only

Training and evaluation data

Based on the sample 1.2 dataset released by DocVQA

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss
4.3689 3.51 100 3.7775
3.2761 7.02 200 3.3707
2.6415 10.53 300 3.0807
2.2233 14.04 400 3.0120
1.9586 17.54 500 2.9087

Framework versions

  • Transformers 4.34.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.10.1
  • Tokenizers 0.14.1
Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.