m3hrdadfi's picture
Update README.md
590f9b2
|
raw
history blame
4.34 kB
metadata
language: en
widget:
  - text: He had also stgruggled with addiction during his time in Congress .
  - text: >-
      The review thoroughla assessed all aspects of JLENS SuR and CPG esign
      maturit and confidence .
  - text: Letterma also apologized two his staff for the satyation .
  - text: >-
      Vincent Jay had earlier won France 's first gold in gthe 10km biathlon
      sprint .
  - text: >-
      It is left to the directors to figure out hpw to bring the stry across to
      tye audience .

Typo Detector

Dataset Information

For this specific task, I used NeuSpell corpus as my raw data.

Evaluation

The following tables summarize the scores obtained by model overall and per each class.

# precision recall f1-score support
TYPO 0.992332 0.985997 0.989154 416054.0
micro avg 0.992332 0.985997 0.989154 416054.0
macro avg 0.992332 0.985997 0.989154 416054.0
weighted avg 0.992332 0.985997 0.989154 416054.0

How to use

You use this model with Transformers pipeline for NER (token-classification).

Installing requirements

pip install transformers

Prediction using pipeline

import torch
from transformers import AutoConfig, AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline


model_name_or_path = "m3hrdadfi/typo-detector-distilbert-en"
config = AutoConfig.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path, config=config)
nlp = pipeline('token-classification', model=model, tokenizer=tokenizer, aggregation_strategy="average")
sentences = [
 "He had also stgruggled with addiction during his time in Congress .",
 "The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .",
 "Letterma also apologized two his staff for the satyation .",
 "Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint .",
 "It is left to the directors to figure out hpw to bring the stry across to tye audience .",
]

for sentence in sentences:
    typos = [sentence[r["start"]: r["end"]] for r in nlp(sentence)]

    detected = sentence
    for typo in typos:
        detected = detected.replace(typo, f'<i>{typo}</i>')

    print("   [Input]: ", sentence)
    print("[Detected]: ", detected)
    print("-" * 130)

Output: ```text [Input]: He had also stgruggled with addiction during his time in Congress . [Detected]: He had also stgruggled with addiction during his time in Congress .

[Input]: The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence . [Detected]: The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .

[Input]: Letterma also apologized two his staff for the satyation . [Detected]: Letterma also apologized two his staff for the satyation .

[Input]: Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint . [Detected]: Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint .

[Input]: It is left to the directors to figure out hpw to bring the stry across to tye audience . [Detected]: It is left to the directors to figure out hpw to bring the stry across to tye audience .


## Questions?
Post a Github issue on the [TypoDetector Issues](https://github.com/m3hrdadfi/typo-detector/issues) repo.