File size: 1,831 Bytes
c660016 3e9f9e1 c660016 3e9f9e1 4f0a193 ce50817 4f0a193 20dff09 ce50817 ff215b7 ce50817 ff215b7 ce50817 ff215b7 ce50817 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
license: apache-2.0
language:
- ar
- dza
pipeline_tag: text-classification
tags:
- hate-detection
- classification
library_name: PyTorch
---
# Dzarashield
Dzarashield is a fine-tuned model based on [DzaraBert](https://huggingface.co./Sifal/dzarabert) . It specializes in hate speech detection for Algerian Arabic text (Darija).
It has been trained on a dataset consisting of 13.5k documents, constructed from manually labeled documents and various sources, achieving an F1 score of 0.87 on a holdout test of 2.5k samples.
## Limitations
It's important to note that this model has been fine-tuned solely on Arabic characters, which means that tokens from other languages have been pruned.
# How to use
## Setup:
```
!git lfs install
!git clone https://huggingface.co./Sifal/dzarashield
%cd dzarashield
from model import BertClassifier
from transformers import PreTrainedTokenizerFast
dzarashield = BertClassifier()
PATH = "./pytorch_model.bin"
dzarashield.load_state_dict(torch.load(PATH))
tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
```
## Example:
```
idx_to_label = {0: 'non-hate', 1: 'hate'}
sentences = ['يا وحد الشموتي، تكول دجاج آآآه', 'واش خويا راك غايا؟']
def predict_label(sentence):
tokenized = tokenizer(sentence, return_tensors='pt')
with torch.no_grad():
outputs = dzarashield(**tokenized)
return idx_to_label[outputs.logits.argmax().item()]
for sentence in sentences:
label = predict_label(sentence)
print(f'sentence: {sentence} label: {label}')
```
## Acknowledgments
Dzarashield is built upon the foundations of [Dziribert](https://huggingface.co./alger-ia/dziribert), and I am grateful for their work in making this project possible.
## References
- [Dziribert](https://arxiv.org/pdf/2109.12346.pdf) |