Ocr Correcteur v1

image/jpeg

This model lora weight has been finetune on french OCR dataset. The architecture used is Flan T large. On a sample of 1000. More stong model is under cooks.

  • Install dependencies
!pip install -q transformers accelerate peft diffusers
!pip install -U bitsandbytes
  • Load and merge adaptaters in 8Bit (recommanded)
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer,BitsAndBytesConfig

# Load peft config for pre-trained checkpoint etc.
peft_model_id = "jeanflop/ocr_correcteur-v1"
config = PeftConfig.from_pretrained(peft_model_id)

# load base LLM model and tokenizer
peft_model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path,  load_in_8bit=True,  device_map={"":1})
peft_tokenizer = AutoTokenizer.from_pretrained('google/flan-t5-large')

# Load the Lora model
peft_model = PeftModel.from_pretrained(peft_model, peft_model_id, device_map={"":1})
# model.eval()

print("Peft model loaded")
  • Run inference (recommanded)

Add your text

inputs=f"""
Fix text : {text}"""

Run

peft_model.config.max_length=512
peft_tokenizer.model_max_length=512
inputs = peft_tokenizer(inputs, return_tensors="pt")
outputs = peft_model.generate(**inputs,max_length=512)
answer = peft_tokenizer.decode(outputs[0])
from textwrap import fill

print(fill(answer, width=80))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jeanflop/ocr_correcteur-v1

Finetuned
(109)
this model

Dataset used to train jeanflop/ocr_correcteur-v1