File size: 2,469 Bytes
ccfba36
 
 
 
 
 
 
 
 
 
aef18bc
 
ccfba36
e1af151
 
aef18bc
e1af151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ccfba36
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: mit
language:
- ar
metrics:
- f1
library_name: transformers
pipeline_tag: text-classification
tags:
- code
datasets:
- SinaLab/ArBanking77
---
## ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic

https://www.jarrar.info/publications/JBKEG23.pdf

Online Demo
--------
You can try our model using the demo link below

[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)


ArBanking77 Corpus
--------
ArBanking77 consists of 31,404 (MSA and Palestinian dialect) that are manually Arabized and localized from the original English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, and automatic top-up. A neural model based on AraBERT was fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL)


Corpus Download
--------
A sample data is available in the `data` directory. But the entire ArBanking77 corpus is 
available to download upon request for academic and commercial use. Request to download 
ArBanking77 (corpus and the model).

[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)

Model Download
--------
huggingface: [https://huggingface.co./SinaLab/ArBanking77](https://huggingface.co./SinaLab/ArBanking77)


Model Training
--------

```commandline
    python run_glue_no_trainer.py 
    --model_name_or_path aubmindlab/bert-base-arabertv2
    --train_file ./data/Banking77_Arabized_Ver3_train_MSA_PAL_merged.json 
    --validation_file ./data/Banking77_Arabized_Ver3_val_MSA_PAL_merged.json
    --seed 42 
    --max_length 128 
    --learning_rate 4e-5
    --num_train_epochs 20  
    --per_device_train_batch_size 32 
    --output_dir ./results
```

File
source: [run_glue_no_trainer.py](https://github.com/huggingface/transformers/blob/e9ad51306fdcc3fb79d837d667e21c6d075a2451/examples/pytorch/text-classification/run_glue_no_trainer.py)


Credits
-------
This research is partially funded by the Palestinian Higher Council for Innovation and Excellence.


Citation
-------
Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana Ghanem: [ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic](http://www.jarrar.info/publications/JBKEG23.pdf). In Proceedings of the 1st Arabic Natural Language Processing Conference (ArabicNLP), Part of the EMNLP 2023. ACL.