Papers
arxiv:2208.10684

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

Published on Aug 23, 2022
Authors:
,
,
,
,
,
,

Abstract

Online hate speech detection has become an important issue due to the growth of online content, but resources in languages other than English are extremely limited. We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns. The dataset consists of 109k utterances from news comments and provides a multi-label classification using 1 to 4 labels, and handles subjectivity and intersectionality. We evaluate strong baseline experiments on K-MHaS using Korean-BERT-based language models with six different metrics. KR-BERT with a sub-character tokenizer outperforms others, recognizing decomposed characters in each hate speech class.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2208.10684 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2208.10684 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.