metadata
license: cc-by-sa-4.0
pipeline_tag: fill-mask
Model Card for Silesian HerBERT Base
Silesian HerBERT Base is a HerBERT Base model with a Silesian tokenizer and fine-tuned on Silesian Wikipedia.
Usage
Example code:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("ipipan/silesian-herbert-base")
model = AutoModel.from_pretrained("ipipan/silesian-herbert-base")
output = model(
**tokenizer.batch_encode_plus(
[
(
"Wielgŏ Piyramida we Gizie, mianowanŏ tyż Piyramida ôd Cheopsa, to je nojsrogszŏ a nojbarzij znanŏ ze egipskich piyramid we Gizie.",
)
],
padding='longest',
add_special_tokens=True,
return_tensors='pt'
)
)
License
CC BY-SA 4.0
Citation
If you use this model, please cite the following paper:
@misc{rybak2024transferring,
title={Transferring BERT Capabilities from High-Resource to Low-Resource Languages Using Vocabulary Matching},
author={Piotr Rybak},
year={2024},
eprint={2402.14408},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Authors
The model was created by Piotr Rybak from Linguistic Engineering Group at Institute of Computer Science, Polish Academy of Sciences.
This work was supported by the European Regional Development Fund as a part of 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.