FERNET-C5

FERNET-C5 (Flexible Embedding Representation NETwork) is a monolingual Czech BERT-base model pre-trained from 93GB of Czech Colossal Clean Crawled Corpus (C5). See our paper for details.

Paper

https://link.springer.com/chapter/10.1007/978-3-030-89579-2_3

The preprint of our paper is available at https://arxiv.org/abs/2107.10042.

Citation

If you find this model useful, please cite our paper:

@inproceedings{FERNETC5,
    title        = {Comparison of Czech Transformers on Text Classification Tasks},
    author       = {Lehe{\v{c}}ka, Jan and {\v{S}}vec, Jan},
    year         = 2021,
    booktitle    = {Statistical Language and Speech Processing},
    publisher    = {Springer International Publishing},
    address      = {Cham},
    pages        = {27--37},
    doi          = {10.1007/978-3-030-89579-2_3},
    isbn         = {978-3-030-89579-2},
    editor       = {Espinosa-Anke, Luis and Mart{\'i}n-Vide, Carlos and Spasi{\'{c}}, Irena}
}