ptaszynski's picture
Update README.md
7a87960
|
raw
history blame
2.57 kB
metadata
language: pl
license: cc-by-sa-4.0
datasets:
  - Polish subset of Open Subtitles
  - Polish subset of ParaCrawl
  - Polish Parliamentary Corpus
  - Polish Wikipedia - Feb 2020
  - >-
    Expert-annotated Dataset for Automatic Cyberbullying Detection in Polish
    Laguage

Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection

This is a Polish version of BERT language model, specifically, Polbert, trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.

Fine-tuning dataset

The dataset used for fine-tuning this model was based on the original Dataset for Automatic Cyberbullying Detection in Polish Laguage, which was recently additionally cleaned and re-annotated by experts from Samurai Labs. The improved dataset and will be released separately later.

Acknowledgements

  • We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset.

Author

Michal Ptaszynski - contact me on:

Licences

The finetuned model with all attached files is licensed under CC BY-SA 4.0, or Creative Commons Attribution-ShareAlike 4.0 International License.

Creative Commons License

Citations

Please, cite this model using the following citation.

@article{ptaszynski2022cyberbullyibng-bert-pl,
  title={Polish BERT trained for Automatic Cyberbullying Detection},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
  year={2022},
  publisher={HuggingFace},
  url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
}

References