ArBanking77 / README.md
TymaaHammouda's picture
Update README.md
aef18bc verified
|
raw
history blame
No virus
2.47 kB
---
license: mit
language:
- ar
metrics:
- f1
library_name: transformers
pipeline_tag: text-classification
tags:
- code
datasets:
- SinaLab/ArBanking77
---
## ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
https://www.jarrar.info/publications/JBKEG23.pdf
Online Demo
--------
You can try our model using the demo link below
[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
ArBanking77 Corpus
--------
ArBanking77 consists of 31,404 (MSA and Palestinian dialect) that are manually Arabized and localized from the original English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, and automatic top-up. A neural model based on AraBERT was fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL)
Corpus Download
--------
A sample data is available in the `data` directory. But the entire ArBanking77 corpus is
available to download upon request for academic and commercial use. Request to download
ArBanking77 (corpus and the model).
[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
Model Download
--------
huggingface: [https://huggingface.co./SinaLab/ArBanking77](https://huggingface.co./SinaLab/ArBanking77)
Model Training
--------
```commandline
python run_glue_no_trainer.py
--model_name_or_path aubmindlab/bert-base-arabertv2
--train_file ./data/Banking77_Arabized_Ver3_train_MSA_PAL_merged.json
--validation_file ./data/Banking77_Arabized_Ver3_val_MSA_PAL_merged.json
--seed 42
--max_length 128
--learning_rate 4e-5
--num_train_epochs 20
--per_device_train_batch_size 32
--output_dir ./results
```
File
source: [run_glue_no_trainer.py](https://github.com/huggingface/transformers/blob/e9ad51306fdcc3fb79d837d667e21c6d075a2451/examples/pytorch/text-classification/run_glue_no_trainer.py)
Credits
-------
This research is partially funded by the Palestinian Higher Council for Innovation and Excellence.
Citation
-------
Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana Ghanem: [ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic](http://www.jarrar.info/publications/JBKEG23.pdf). In Proceedings of the 1st Arabic Natural Language Processing Conference (ArabicNLP), Part of the EMNLP 2023. ACL.