Model Card for trocr-base-handwritten_nj_biergarten_captcha_v2
This is a model for CAPTCHA OCR.
Model Details
Model Description
This is a simple model finetuned from microsoft/trocr-base-handwritten
on a dataset
I created at phunc20/nj_biergarten_captcha_v2
.
Uses
Direct Use
import torch
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
hub_dir = "phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2"
processor = TrOCRProcessor.from_pretrained(hub_dir)
model = VisionEncoderDecoderModel.from_pretrained(hub_dir)
model = model.to(device)
from PIL import Image
image = Image.open("/path/to/image")
pixel_values = processor(image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(device)
outputs = model.generate(pixel_values)
pred_str = processor.batch_decode(outputs, skip_special_tokens=True)[0]
Bias, Risks, and Limitations
Although the model seems to perform well on the dataset phunc20/nj_biergarten_captcha_v2
,
it does not exhibit such good performance across all CAPTCHA images. In this respect, this
model is worse than Human.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
Like I mentioned, I trained this model on phunc20/nj_biergarten_captcha_v2
.
In particular, I trained on the train
split and evalaute on validation
split,
without touching the test
split.
Training Procedure
Please refer to https://gitlab.com/phunc20/captchew/-/blob/main/colab_notebooks/train_from_pretrained_Seq2SeqTrainer_torchDataset.ipynb?ref_type=heads which is adapted from https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TrOCR/Fine_tune_TrOCR_on_IAM_Handwriting_Database_using_Seq2SeqTrainer.ipynb
Evaluation
Testing Data, Factors & Metrics
Testing Data
- The
test
split ofphunc20/nj_biergarten_captcha_v2
- This Kaggle dataset https://www.kaggle.com/datasets/fournierp/captcha-version-2-images/data
(we shall call this dataset by the name of
kaggle_test_set
in this model card.)
Factors
[More Information Needed]
Metrics
CER, exact match and average length difference. The former two can be found in HuggingFace's documentation. The last one is just one metric I care a little about. It is quite easy to understand and, if need be, explanation could be found at the source code: https://gitlab.com/phunc20/captchew/-/blob/v0.1/average_length_difference.py
Results
On the test
split of phunc20/nj_biergarten_captcha_v2
Model | cer | exact match | avg len diff |
---|---|---|---|
phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2 |
0.001333 | 496/500 | 1/500 |
microsoft/trocr-base-handwritten |
0.9 | 5/500 | 2.4 |
On kaggle_test_set
Model | cer | exact match | avg len diff |
---|---|---|---|
phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2 |
0.4381 | 69/1070 | 0.1289 |
microsoft/trocr-base-handwritten |
1.0112 | 17/1070 | 2.4439 |
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
- Downloads last month
- 0
Model tree for phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2
Base model
microsoft/trocr-base-handwritten