|
# Model Card for ResNet-50 Text Detector |
|
This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~70k images, where 50% of them had text and 50% of them had no legible text. |
|
|
|
# Model Details |
|
## How to Get Started with the Model |
|
```python |
|
from PIL import Image |
|
import requests |
|
|
|
from transformers import AutoImageProcessor, AutoModelForImageClassification |
|
|
|
model = AutoModelForImageClassification.from_pretrained( |
|
"miguelcarv/resnet-50-text-detector", |
|
) |
|
|
|
processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False) |
|
|
|
url = "http://images.cocodataset.org/train2017/000000044520.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((256,256)) |
|
|
|
inputs = processor(image, return_tensors="pt").pixel_values |
|
|
|
outputs = model(inputs) |
|
logits_per_image = outputs.logits |
|
probs = logits_per_image.softmax(dim=1) |
|
print(probs) |
|
# tensor([[0.1149, 0.8851]]) |
|
``` |
|
# Training Details |
|
- Trained for three epochs |
|
- Resolution: 256x256 |
|
- Learning rate: 5e-5 |
|
- Optimizer: AdamW |
|
- Batch size: 64 |
|
- Trained with FP32 |