cjvt
/


language:

  • sl

license: cc-by-sa-4.0

T5-incorrect-word-spelling-corrector

This T5 model is designed to identify and correct words with incorrect spelling in the Slovenian language.

Model Output Example

Consider the following Slovenian text:

Model v besedlu popravi napaake v nepravilno črkovanih besedah.

The model might return the following text (note: predictions chosen for demonstration/explanation, not reproducibility!):

Model v besedilu popravi napake v nepravilno črkovanih besedah.

We observe that in the input sentence, the words besedlu and napaake are incorrectly spelled, so the model corrects them to besedilu and napake.

More details

Testing the model with generated test sets provides the following result (combining detection and correction of words with incorrect spelling):

  • Precission: 0,986
  • Recall: 0,935
  • F1: 0,960

Testing the model, in combination with cjvt/SloBERTa-slo-word-spelling-annotator, with test sets constructed using the Šolar Eval dataset provides the following results (combining detection and correction of words with incorrect spelling):

  • Precission: 0,823
  • Recall: 0,796
  • F1: 0,810

Acknowledgement

The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills.

Authors

Thanks to Martin Božič, Marko Robnik-Šikonja and Špela Arhar Holdt for developing these models.

Downloads last month
29
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train cjvt/t5-slo-word-spelling-corrector