language:

license: cc-by-sa-4.0

T5-incorrect-word-spelling-corrector

This T5 model is designed to identify and correct words with incorrect spelling in the Slovenian language.

Model Output Example

Consider the following Slovenian text:

Model v besedlu popravi napaake v nepravilno črkovanih besedah.

The model might return the following text (note: predictions chosen for demonstration/explanation, not reproducibility!):

Model v besedilu popravi napake v nepravilno črkovanih besedah.

We observe that in the input sentence, the words besedlu and napaake are incorrectly spelled, so the model corrects them to besedilu and napake.

More details

Testing the model with generated test sets provides the following result (combining detection and correction of words with incorrect spelling):

Precission: 0,986
Recall: 0,935
F1: 0,960

Testing the model, in combination with cjvt/SloBERTa-slo-word-spelling-annotator, with test sets constructed using the Šolar Eval dataset provides the following results (combining detection and correction of words with incorrect spelling):

Precission: 0,823
Recall: 0,796
F1: 0,810

Acknowledgement

The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills.

Authors

Thanks to Martin Božič, Marko Robnik-Šikonja and Špela Arhar Holdt for developing these models.

cjvt
/

t5-slo-word-spelling-corrector