Faster-RCNN model
Pretrained on DocArtefacts. The Faster-RCNN architecture was introduced in this paper.
Model description
The core idea of the author is to unify Region Proposal with the core detection module of Fast-RCNN.
Installation
Prerequisites
Python 3.6 (or higher) and pip are required to install docTR.
Latest stable release
You can install the last stable release of the package using pypi as follows:
pip install python-doctr[torch]
Developer mode
Alternatively, if you wish to use the latest features of the project that haven't made their way to a release yet, you can install the package from source (install Git first):
git clone https://github.com/mindee/doctr.git
pip install -e doctr/.[torch]
Usage instructions
from PIL import Image
import torch
from torchvision.transforms import Compose, ConvertImageDtype, PILToTensor
from doctr.models.obj_detection.factory import from_hub
model = from_hub("mindee/fasterrcnn_mobilenet_v3_large_fpn").eval()
img = Image.open(path_to_an_image).convert("RGB")
# Preprocessing
transform = Compose([
PILToTensor(),
ConvertImageDtype(torch.float32),
])
input_tensor = transform(img).unsqueeze(0)
# Inference
with torch.inference_mode():
output = model(input_tensor)
Citation
Original paper
@article{DBLP:journals/corr/RenHG015,
author = {Shaoqing Ren and
Kaiming He and
Ross B. Girshick and
Jian Sun},
title = {Faster {R-CNN:} Towards Real-Time Object Detection with Region Proposal
Networks},
journal = {CoRR},
volume = {abs/1506.01497},
year = {2015},
url = {http://arxiv.org/abs/1506.01497},
eprinttype = {arXiv},
eprint = {1506.01497},
timestamp = {Mon, 13 Aug 2018 16:46:02 +0200},
biburl = {https://dblp.org/rec/journals/corr/RenHG015.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Source of this implementation
@misc{doctr2021,
title={docTR: Document Text Recognition},
author={Mindee},
year={2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/mindee/doctr}}
}
- Downloads last month
- 54
Inference API (serverless) is not available, repository is disabled.