Model Card for trocr-base-handwritten_nj_biergarten_captcha_v2

This is a model for CAPTCHA OCR.

Model Details

Model Description

This is a simple model finetuned from microsoft/trocr-base-handwritten on a dataset I created at phunc20/nj_biergarten_captcha_v2.

Uses

Direct Use

import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")


from transformers import TrOCRProcessor, VisionEncoderDecoderModel

hub_dir = "phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2"
processor = TrOCRProcessor.from_pretrained(hub_dir)
model = VisionEncoderDecoderModel.from_pretrained(hub_dir)
model = model.to(device)


from PIL import Image

image = Image.open("/path/to/image")
pixel_values = processor(image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(device)
outputs = model.generate(pixel_values)
pred_str = processor.batch_decode(outputs, skip_special_tokens=True)[0]

Bias, Risks, and Limitations

Although the model seems to perform well on the dataset phunc20/nj_biergarten_captcha_v2, it does not exhibit such good performance across all CAPTCHA images. In this respect, this model is worse than Human.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

Like I mentioned, I trained this model on phunc20/nj_biergarten_captcha_v2. In particular, I trained on the train split and evalaute on validation split, without touching the test split.

Training Procedure

Please refer to https://gitlab.com/phunc20/captchew/-/blob/main/colab_notebooks/train_from_pretrained_Seq2SeqTrainer_torchDataset.ipynb?ref_type=heads which is adapted from https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TrOCR/Fine_tune_TrOCR_on_IAM_Handwriting_Database_using_Seq2SeqTrainer.ipynb

Evaluation

Testing Data, Factors & Metrics

Testing Data

The test split of phunc20/nj_biergarten_captcha_v2
This Kaggle dataset https://www.kaggle.com/datasets/fournierp/captcha-version-2-images/data (we shall call this dataset by the name of kaggle_test_set in this model card.)

Factors

[More Information Needed]

Metrics

CER, exact match and average length difference. The former two can be found in HuggingFace's documentation. The last one is just one metric I care a little about. It is quite easy to understand and, if need be, explanation could be found at the source code: https://gitlab.com/phunc20/captchew/-/blob/v0.1/average_length_difference.py

Results

On the test split of phunc20/nj_biergarten_captcha_v2

Model	cer	exact match	avg len diff
`phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2`	0.001333	496/500	1/500
`microsoft/trocr-base-handwritten`	0.9	5/500	2.4

On kaggle_test_set

Model	cer	exact match	avg len diff
`phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2`	0.4381	69/1070	0.1289
`microsoft/trocr-base-handwritten`	1.0112	17/1070	2.4439

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

phunc20
/

trocr-base-handwritten_nj_biergarten_captcha_v2