Model Card for Model ID
Model Details
์ค ๋จ์๋ก ์์์ด ํฌํจ๋ ๊ธ์๋ฅผ ์ธ์ ๋ชจ๋ธ์
๋๋ค.
microsoft TrOCR-large ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํ๊ตญ์ด + latex ๋ฐ์ดํฐ์
finetuning ํ์ต๋๋ค.
์ค ๋จ์๋ก ์ด๋ฏธ์ง๋ฅผ cropํ๋ ๋ณ๋์ detector๊ฐ ํ์ํฉ๋๋ค.
Uses
Direct Use
from PIL import Image
import glob
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
import torch
import IPython.display as ipd
## ์ด๋ฏธ์ง ์ค๋น
img_path_list = sorted(glob.glob('images/mathematical_expression_2-*.png'))
img_list = [Image.open(img_path).convert("RGB") for img_path in img_path_list]
## ๋ชจ๋ธ ๋ฐ ํ๋ก์ธ์ ์ค๋น
model_path = 'TeamUNIVA/23MATHQ_TrOCR-large'
processor = TrOCRProcessor.from_pretrained(model_path)
model = VisionEncoderDecoderModel.from_pretrained(model_path)
model.eval()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
processor.feature_extractor.size = model.config.encoder.image_size
gc = model.generation_config
gc.max_length = 128
gc.early_stopping = True
gc.no_repeat_ngram_size = 3
gc.length_penalty = 2.0
gc.num_beams = 4
gc.eos_token_id = processor.tokenizer.sep_token_id
## TrOCR ์ถ๋ก
pixel_values = processor(img_list, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values.to(model.device), pad_token_id=processor.tokenizer.eos_token_id)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
for img,text in zip(img_list, generated_text):
ipd.display(img)
print(text)
Result example
BibTeX entry and citation info
@misc{li2021trocr,
title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
year={2021},
eprint={2109.10282},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 7
Inference API (serverless) does not yet support transformers models for this pipeline type.