Model Card for trocr-base-handwritten_nj_biergarten_captcha_v2

This is a model for CAPTCHA OCR.

Model Details

Model Description

This is a simple model finetuned from microsoft/trocr-base-handwritten on a dataset I created at phunc20/nj_biergarten_captcha_v2.

Uses

Direct Use

import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")


from transformers import TrOCRProcessor, VisionEncoderDecoderModel

hub_dir = "phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2"
processor = TrOCRProcessor.from_pretrained(hub_dir)
model = VisionEncoderDecoderModel.from_pretrained(hub_dir)
model = model.to(device)


from PIL import Image

image = Image.open("/path/to/image")
pixel_values = processor(image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(device)
outputs = model.generate(pixel_values)
pred_str = processor.batch_decode(outputs, skip_special_tokens=True)[0]

Bias, Risks, and Limitations

Although the model seems to perform well on the dataset phunc20/nj_biergarten_captcha_v2, it does not exhibit such good performance across all CAPTCHA images. In this respect, this model is worse than Human.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

Like I mentioned, I trained this model on phunc20/nj_biergarten_captcha_v2. In particular, I trained on the train split and evalaute on validation split, without touching the test split.

Training Procedure

Please refer to https://gitlab.com/phunc20/captchew/-/blob/main/colab_notebooks/train_from_pretrained_Seq2SeqTrainer_torchDataset.ipynb?ref_type=heads which is adapted from https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TrOCR/Fine_tune_TrOCR_on_IAM_Handwriting_Database_using_Seq2SeqTrainer.ipynb

Evaluation

Testing Data, Factors & Metrics

Testing Data

  1. The test split of phunc20/nj_biergarten_captcha_v2
  2. This Kaggle dataset https://www.kaggle.com/datasets/fournierp/captcha-version-2-images/data (we shall call this dataset by the name of kaggle_test_set in this model card.)

Factors

[More Information Needed]

Metrics

CER, exact match and average length difference. The former two can be found in HuggingFace's documentation. The last one is just one metric I care a little about. It is quite easy to understand and, if need be, explanation could be found at the source code: https://gitlab.com/phunc20/captchew/-/blob/v0.1/average_length_difference.py

Results

On the test split of phunc20/nj_biergarten_captcha_v2

Model cer exact match avg len diff
phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2 0.001333 496/500 1/500
microsoft/trocr-base-handwritten 0.9 5/500 2.4

On kaggle_test_set

Model cer exact match avg len diff
phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2 0.4381 69/1070 0.1289
microsoft/trocr-base-handwritten 1.0112 17/1070 2.4439

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]
Downloads last month
0
Safetensors
Model size
334M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2

Finetuned
(8)
this model

Dataset used to train phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2