File size: 3,448 Bytes
e063cac
5663327
7a87960
5663327
 
 
 
 
 
e063cac
7a87960
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bc3258
7a87960
 
 
 
 
 
 
 
 
 
 
 
 
2dc175b
7a87960
 
 
 
 
 
 
 
 
 
2dc175b
 
 
 
 
 
 
 
 
 
 
 
5663327
 
 
2dc175b
5663327
 
 
 
 
 
 
 
 
 
2dc175b
 
7a87960
 
 
5663327
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: cc-by-4.0
datasets:
- ptaszynski/PolishCyberbullyingDataset
language:
- pl
tags:
- cyberbullying
- hate-speech
---

# Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection
This is a Polish version of BERT language model, specifically, [Polbert](https://huggingface.co./dkleczek/bert-base-polish-uncased-v1), trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.


## Fine-tuning dataset
The dataset used for fine-tuning this model was based on the original [Dataset for Automatic Cyberbullying Detection in Polish Laguage](https://huggingface.co./datasets/poleval2019_cyberbullying), which was recently additionally cleaned and re-annotated by experts from [Samurai Labs](https://www.samurailabs.ai/). The improved dataset and will be released separately later.


## Acknowledgements
* We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset.

## Author
Michal Ptaszynski - contact me on:
- Twitter: [@mich_ptaszynski](https://twitter.com/mich_ptaszynski)
- GitHub: [ptaszynski](https://github.com/ptaszynski)
- LinkedIn: [michalptaszynski](https://jp.linkedin.com/in/michalptaszynski)
- HuggingFace: [ptaszynski](https://huggingface.co./ptaszynski)


## Licences
The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License.

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>



## Citations
Please, cite this model using the following citation.

Model:
```
@article{ptaszynski2022cyberbullyibng-bert-pl,
  title={Polish BERT trained for Automatic Cyberbullying Detection},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
  year={2022},
  publisher={HuggingFace},
  url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
}
```

Original dataset:
```
@article{ptaszynski2019results,
  title={Results of the poleval 2019 shared task 6: First dataset and open shared task for automatic cyberbullying detection in polish twitter},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dyba{\l}a, Pawe{\l}},
  year={2019},
  publisher={Warszawa: Institute of Computer Sciences. Polish Academy of Sciences}
}
```

Improved dataset:

The improved dataset used for training this model was released as follows.
[Expert-annotated dataset to study cyberbullying in Polish language](https://huggingface.co./datasets/ptaszynski/PolishCyberbullyingDataset)

```
@article{ptaszynski2023expert,
  title={Expert-Annotated Dataset to Study Cyberbullying in Polish Language},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
  journal={Data},
  volume={9},
  number={1},
  pages={1},
  year={2023},
  publisher={MDPI}
}
```

## References
* https://github.com/google-research/bert
* https://github.com/ptaszynski/cyberbullying-Polish
* https://huggingface.co./datasets/poleval2019_cyberbullying