File size: 1,831 Bytes
c660016
 
 
 
3e9f9e1
c660016
 
 
 
3e9f9e1
4f0a193
ce50817
4f0a193
 
20dff09
 
ce50817
 
 
 
 
 
ff215b7
ce50817
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff215b7
ce50817
ff215b7
 
 
 
 
 
 
 
 
 
 
 
 
 
ce50817
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: apache-2.0
language:
- ar
- dza
pipeline_tag: text-classification
tags:
- hate-detection
- classification
library_name: PyTorch
---

# Dzarashield

Dzarashield is a fine-tuned model based on  [DzaraBert](https://huggingface.co./Sifal/dzarabert) . It specializes in hate speech detection for Algerian Arabic text (Darija).
It has been trained on a dataset consisting of 13.5k documents, constructed from manually labeled documents and various sources, achieving an F1 score of 0.87 on a holdout test of 2.5k samples.

## Limitations 

It's important to note that this model has been fine-tuned solely on Arabic characters, which means that tokens from other languages have been pruned.

# How to use
## Setup:
```
!git lfs install
!git clone https://huggingface.co./Sifal/dzarashield
%cd dzarashield

from model import BertClassifier
from transformers import PreTrainedTokenizerFast

dzarashield = BertClassifier()
PATH = "./pytorch_model.bin"

dzarashield.load_state_dict(torch.load(PATH))
tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")

```
## Example:

```
idx_to_label = {0: 'non-hate', 1: 'hate'}
sentences = ['يا وحد الشموتي، تكول دجاج آآآه', 'واش خويا راك غايا؟']

def predict_label(sentence):
    tokenized = tokenizer(sentence, return_tensors='pt')
    with torch.no_grad():
        outputs = dzarashield(**tokenized)
        return idx_to_label[outputs.logits.argmax().item()]

for sentence in sentences:
    label = predict_label(sentence)
    print(f'sentence: {sentence} label: {label}')
```
## Acknowledgments

Dzarashield is built upon the foundations of [Dziribert](https://huggingface.co./alger-ia/dziribert), and I am grateful for their work in making this project possible.

## References

- [Dziribert](https://arxiv.org/pdf/2109.12346.pdf)