Model Card for ResNet-50 Text Detector

This model was trained with the intent to classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~70k images, where 50% of them had text and 50% of them had no legible text.

Model Details

How to Get Started with the Model

import torch
import transformers

model = transformers.AutoModelForCausalLM.from_pretrained(
    "miguelcarv/phi-1_5-slimorca",
    trust_remote_code=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained("microsoft/phi-1_5")


SYSTEM_PROMPT = "You are an AI assistant. You will be given a task. You must generate a detailed and long answer."
input_text = f"""{SYSTEM_PROMPT}

Instruction: Give me the first 5 prime numbers and explain what prime numbers are.
Output:"""

with torch.no_grad():
    outputs = model.generate(
        tokenizer(input_text, return_tensors="pt")['input_ids'],
        max_length=1024,
        num_beams = 3,
        eos_token_id = tokenizer.eos_token_id
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Trained for three epochs
Learning rate: 5e-5
Optimizer: AdamW
Batch size: 64
Trained with FP32