|
--- |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- bert |
|
- multilabel |
|
- classification |
|
- finetune |
|
- finance |
|
- regulatory |
|
- text |
|
- risk |
|
metrics: |
|
- f1 |
|
pipeline_tag: text-classification |
|
widget: |
|
- text: >- |
|
Where an FI employs a technological solution provided by an external party |
|
to conduct screening of virtual asset transactions and the associated wallet |
|
addresses, the FI remains responsible for discharging its AML/CFT |
|
obligations. The FI should conduct due diligence on the solution before |
|
deploying it, taking into account relevant factors such as: |
|
--- |
|
|
|
This model is a fine-tuned version of the BERT language model, specifically adapted for multi-label classification tasks in the |
|
financial regulatory domain. It is built upon the pre-trained ProsusAI/finbert model, which has been further fine-tuned using a diverse |
|
dataset of financial regulatory texts. This allows the model to accurately classify text into multiple relevant categories simultaneously. |
|
|
|
# Model Architecture |
|
|
|
- **Base Model**: BERT |
|
- **Pre-trained Model**: ProsusAI/finbert |
|
- **Task**: Multi-label classification |
|
|
|
|
|
## Performance |
|
|
|
Performance metrics on the validation set: |
|
|
|
- F1 Score: 0.8637 |
|
- ROC AUC: 0.9044 |
|
- Accuracy: 0.6155 |
|
|
|
## Limitations and Ethical Considerations |
|
|
|
- This model's performance may vary depending on the specific nature of the text data and label distribution. |
|
- Class imbalance in the dataset. |
|
|
|
## Dataset Information |
|
|
|
- **Training Dataset**: Number of samples: 6562 |
|
- **Validation Dataset**: Number of samples: 929 |
|
- **Test Dataset**: Number of samples: 1884 |
|
|
|
## Training Details |
|
|
|
- **Training Strategy**: Fine-tuning BERT with a randomly initialized classification head. |
|
- **Optimizer**: Adam |
|
- **Learning Rate**: 1e-4 |
|
- **Batch Size**: 16 |
|
- **Number of Epochs**: 2 |
|
- **Evaluation Strategy**: Epoch |
|
- **Weight Decay**: 0.01 |
|
- **Metric for Best Model**: F1 Score |