File size: 2,991 Bytes
e063cac 7a87960 e063cac 7a87960 e063cac 7a87960 2dc175b 7a87960 2dc175b 7a87960 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
language: pl
license: cc-by-sa-4.0
datasets:
- Polish subset of Open Subtitles
- Polish subset of ParaCrawl
- Polish Parliamentary Corpus
- Polish Wikipedia - Feb 2020
- Expert-annotated Dataset for Automatic Cyberbullying Detection in Polish Laguage
---
# Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection
This is a Polish version of BERT language model, specifically, [Polbert](https://huggingface.co./dkleczek/bert-base-polish-uncased-v1), trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.
## Fine-tuning dataset
The dataset used for fine-tuning this model was based on the original [Dataset for Automatic Cyberbullying Detection in Polish Laguage](https://huggingface.co./datasets/poleval2019_cyberbullying), which was recently additionally cleaned and re-annotated by experts from [Samurai Labs](https://www.samurailabs.ai/). The improved dataset and will be released separately later.
## Acknowledgements
* We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset.
## Author
Michal Ptaszynski - contact me on:
- Twitter: [@mich_ptaszynski](https://twitter.com/mich_ptaszynski)
- GitHub: [ptaszynski](https://github.com/ptaszynski)
- LinkedIn: [michalptaszynsk](https://jp.linkedin.com/in/michalptaszynski)
- HuggingFace: [ptaszynski](https://huggingface.co./ptaszynski)
## Licences
The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License.
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>
## Citations
Please, cite this model using the following citation.
Model:
```
@article{ptaszynski2022cyberbullyibng-bert-pl,
title={Polish BERT trained for Automatic Cyberbullying Detection},
author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
year={2022},
publisher={HuggingFace},
url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
}
```
Original dataset:
```
@article{ptaszynski2019results,
title={Results of the poleval 2019 shared task 6: First dataset and open shared task for automatic cyberbullying detection in polish twitter},
author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dyba{\l}a, Pawe{\l}},
year={2019},
publisher={Warszawa: Institute of Computer Sciences. Polish Academy of Sciences}
}
```
Improved dataset:
```
TBA
```
## References
* https://github.com/google-research/bert
* https://github.com/ptaszynski/cyberbullying-Polish
* https://huggingface.co./datasets/poleval2019_cyberbullying
|