GIZ
/

SECTOR-multilabel-bge_f

@@ -6,6 +6,8 @@ tags:
 model-index:
 - name: SECTOR-multilabel-bge
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -13,7 +15,9 @@ should probably proofread and complete it, then remove this comment. -->
 # SECTOR-multilabel-bge
-This model is a fine-tuned version of [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.6114
 - Precision-micro: 0.6428
@@ -28,7 +32,9 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
@@ -36,7 +42,49 @@ More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -64,10 +112,41 @@ The following hyperparameters were used during training:
 | 0.0892        | 6.0   | 3798 | 0.6073          | 0.6425          | 0.7499            | 0.6545             | 0.7844       | 0.8610         | 0.7844          | 0.7064   | 0.7634     | 0.7113      |
 | 0.0721        | 7.0   | 4431 | 0.6114          | 0.6428          | 0.7488            | 0.6519             | 0.7855       | 0.8627         | 0.7855          | 0.7071   | 0.7638     | 0.7109      |
 ### Framework versions
 - Transformers 4.38.1
 - Pytorch 2.1.0+cu121
 - Datasets 2.18.0
-- Tokenizers 0.15.2

 model-index:
 - name: SECTOR-multilabel-bge
   results: []
+datasets:
+- GIZ/policy_classification
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # SECTOR-multilabel-bge
+This model is a fine-tuned version of [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the [Policy-Classification](https://huggingface.co/datasets/GIZ/policy_classification) dataset.
+*The loss function BCEWithLogitsLoss is modified with pos_weight to focus on recall, therefore instead of loss the evaluation metrics are used to assess the model performance during training*
 It achieves the following results on the evaluation set:
 - Loss: 0.6114
 - Precision-micro: 0.6428
 ## Model description
+The purpose of this model is to predict multiple labels simultaneously from a given input data. Specifically, the model will predict Sector labels - Agriculture,Buildings,
+Coastal Zone,Cross-Cutting Area,Disaster Risk Management (DRM),Economy-wide,Education,Energy,Environment,Health,Industries,LULUCF/Forestry,Social Development,Tourism,
+Transport,Urban,Waste,Water
 ## Intended uses & limitations
 ## Training and evaluation data
+- Training Dataset: 10123
+| Class | Positive Count of Class|
+|:-------------|:--------|
+| Agriculture | 2235 |
+| Buildings | 169 |
+| Coastal Zone | 698|
+| Cross-Cutting Area | 1853 |
+| Disaster Risk Management (DRM) | 814 |
+| Economy-wide | 873 |
+| Education | 180|
+| Energy | 2847 |
+| Environment | 905 |
+| Health | 662|
+| Industries | 419 |
+| LULUCF/Forestry | 1861|
+| Social Development | 507 |
+| Tourism | 192 |
+| Transport | 1173|
+| Urban | 558 |
+| Waste | 714|
+| Water | 1207 |
+- Validation Dataset: 936
+| Class | Positive Count of Class|
+|:-------------|:--------|
+| Agriculture | 200 |
+| Buildings | 18 |
+| Coastal Zone | 71|
+| Cross-Cutting Area | 180 |
+| Disaster Risk Management (DRM) | 85 |
+| Economy-wide | 85 |
+| Education | 23|
+| Energy | 254 |
+| Environment | 91 |
+| Health | 68|
+| Industries | 41 |
+| LULUCF/Forestry | 193|
+| Social Development | 56 |
+| Tourism | 28 |
+| Transport | 107|
+| Urban | 51 |
+| Waste | 59|
+| Water | 106 |
 ## Training procedure
 | 0.0892        | 6.0   | 3798 | 0.6073          | 0.6425          | 0.7499            | 0.6545             | 0.7844       | 0.8610         | 0.7844          | 0.7064   | 0.7634     | 0.7113      |
 | 0.0721        | 7.0   | 4431 | 0.6114          | 0.6428          | 0.7488            | 0.6519             | 0.7855       | 0.8627         | 0.7855          | 0.7071   | 0.7638     | 0.7109      |
+|label          | precision |recall |f1-score| support|
+|:-------------:|:---------:|:-----:|:------:|:------:|
+| Agriculture | 0.720 | 0.850|0.780|200|
+| Buildings | 0.636 |0.777|0.700|18|
+| Coastal Zone | 0.562|0.760|0.646|71|
+| Cross-Cutting Area | 0.569 |0.777|0.657|180|
+| Disaster Risk Management (DRM) | 0.567 |0.694|0.624|85|
+| Economy-wide | 0.461 |0.635|	0.534|85|
+| Education | 0.608|0.608|0.608|23|
+| Energy | 0.816 |0.838|0.827|254|
+| Environment | 0.561 |0.703|0.624|91|
+| Health | 0.708|0.750|0.728|68|
+| Industries | 0.660 |0.902|0.762|41|
+| LULUCF/Forestry | 0.676|0.844|0.751|193|
+| Social Development | 0.593 |	0.678|0.633|56|
+| Tourism | 0.551 |0.571|0.561|28|
+| Transport | 0.700|0.766|0.732|107|
+| Urban | 0.414 |0.568|0.479|51|
+| Waste | 0.658|0.881|0.753|59|
+| Water | 0.602 |0.773|0.677|106|
+### Environmental Impact
+Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
+- **Carbon Emitted**: 0.02867 kg of CO2
+- **Hours Used**: 0.706 hours
+### Training Hardware
+- **On Cloud**: yes
+- **GPU Model**: 1 x Tesla T4
+- **CPU Model**: Intel(R) Xeon(R) CPU @ 2.00GHz
+- **RAM Size**: 12.67 GB
 ### Framework versions
 - Transformers 4.38.1
 - Pytorch 2.1.0+cu121
 - Datasets 2.18.0
+- Tokenizers 0.15.2