--- language: - en library_name: transformers tags: - bert - multilabel - classification - finetune - finance - regulatory - text - risk metrics: - f1 pipeline_tag: text-classification widget: - text: >- Where an FI employs a technological solution provided by an external party to conduct screening of virtual asset transactions and the associated wallet addresses, the FI remains responsible for discharging its AML/CFT obligations. The FI should conduct due diligence on the solution before deploying it, taking into account relevant factors such as: --- This model is a fine-tuned version of the BERT language model, specifically adapted for multi-label classification tasks in the financial regulatory domain. It is built upon the pre-trained ProsusAI/finbert model, which has been further fine-tuned using a diverse dataset of financial regulatory texts. This allows the model to accurately classify text into multiple relevant categories simultaneously. # Model Architecture - **Base Model**: BERT - **Pre-trained Model**: ProsusAI/finbert - **Task**: Multi-label classification ## Performance Performance metrics on the validation set: - F1 Score: 0.8637 - ROC AUC: 0.9044 - Accuracy: 0.6155 ## Limitations and Ethical Considerations - This model's performance may vary depending on the specific nature of the text data and label distribution. - Class imbalance in the dataset. ## Dataset Information - **Training Dataset**: Number of samples: 6562 - **Validation Dataset**: Number of samples: 929 - **Test Dataset**: Number of samples: 1884 ## Training Details - **Training Strategy**: Fine-tuning BERT with a randomly initialized classification head. - **Optimizer**: Adam - **Learning Rate**: 1e-4 - **Batch Size**: 16 - **Number of Epochs**: 2 - **Evaluation Strategy**: Epoch - **Weight Decay**: 0.01 - **Metric for Best Model**: F1 Score