File size: 607 Bytes
c622037 2e82ad0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Based on Finnish pretrained T5 model version small-nl24
Train data:
Around 300k samples from from following datasets
- [wikipedia](https://huggingface.co./datasets/wikipedia)
- [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501)
- [Yle Finnish News Archive 2019-2020](http://urn.fi/urn:nbn:fi:lb-2021050401)
- [Finnish News Agency Archive (STT)](http://urn.fi/urn:nbn:fi:lb-2018121001)
- [The Suomi24 Sentences Corpus](http://urn.fi/urn:nbn:fi:lb-2020021803)
Tested with 1000 samples from the previous datasets Median CER 1.1% MEAN CER 4.2%
More detailed info coming later...
|