GIZ
/

ppsingh commited on
Commit
a02111d
1 Parent(s): 65249b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -4
README.md CHANGED
@@ -6,6 +6,8 @@ tags:
6
  model-index:
7
  - name: SECTOR-multilabel-bge
8
  results: []
 
 
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -13,7 +15,9 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # SECTOR-multilabel-bge
15
 
16
- This model is a fine-tuned version of [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the None dataset.
 
 
17
  It achieves the following results on the evaluation set:
18
  - Loss: 0.6114
19
  - Precision-micro: 0.6428
@@ -28,7 +32,9 @@ It achieves the following results on the evaluation set:
28
 
29
  ## Model description
30
 
31
- More information needed
 
 
32
 
33
  ## Intended uses & limitations
34
 
@@ -36,7 +42,49 @@ More information needed
36
 
37
  ## Training and evaluation data
38
 
39
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## Training procedure
42
 
@@ -64,10 +112,41 @@ The following hyperparameters were used during training:
64
  | 0.0892 | 6.0 | 3798 | 0.6073 | 0.6425 | 0.7499 | 0.6545 | 0.7844 | 0.8610 | 0.7844 | 0.7064 | 0.7634 | 0.7113 |
65
  | 0.0721 | 7.0 | 4431 | 0.6114 | 0.6428 | 0.7488 | 0.6519 | 0.7855 | 0.8627 | 0.7855 | 0.7071 | 0.7638 | 0.7109 |
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  ### Framework versions
69
 
70
  - Transformers 4.38.1
71
  - Pytorch 2.1.0+cu121
72
  - Datasets 2.18.0
73
- - Tokenizers 0.15.2
 
6
  model-index:
7
  - name: SECTOR-multilabel-bge
8
  results: []
9
+ datasets:
10
+ - GIZ/policy_classification
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
15
 
16
  # SECTOR-multilabel-bge
17
 
18
+ This model is a fine-tuned version of [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the [Policy-Classification](https://huggingface.co/datasets/GIZ/policy_classification) dataset.
19
+
20
+ *The loss function BCEWithLogitsLoss is modified with pos_weight to focus on recall, therefore instead of loss the evaluation metrics are used to assess the model performance during training*
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.6114
23
  - Precision-micro: 0.6428
 
32
 
33
  ## Model description
34
 
35
+ The purpose of this model is to predict multiple labels simultaneously from a given input data. Specifically, the model will predict Sector labels - Agriculture,Buildings,
36
+ Coastal Zone,Cross-Cutting Area,Disaster Risk Management (DRM),Economy-wide,Education,Energy,Environment,Health,Industries,LULUCF/Forestry,Social Development,Tourism,
37
+ Transport,Urban,Waste,Water
38
 
39
  ## Intended uses & limitations
40
 
 
42
 
43
  ## Training and evaluation data
44
 
45
+ - Training Dataset: 10123
46
+ | Class | Positive Count of Class|
47
+ |:-------------|:--------|
48
+ | Agriculture | 2235 |
49
+ | Buildings | 169 |
50
+ | Coastal Zone | 698|
51
+ | Cross-Cutting Area | 1853 |
52
+ | Disaster Risk Management (DRM) | 814 |
53
+ | Economy-wide | 873 |
54
+ | Education | 180|
55
+ | Energy | 2847 |
56
+ | Environment | 905 |
57
+ | Health | 662|
58
+ | Industries | 419 |
59
+ | LULUCF/Forestry | 1861|
60
+ | Social Development | 507 |
61
+ | Tourism | 192 |
62
+ | Transport | 1173|
63
+ | Urban | 558 |
64
+ | Waste | 714|
65
+ | Water | 1207 |
66
+
67
+ - Validation Dataset: 936
68
+ | Class | Positive Count of Class|
69
+ |:-------------|:--------|
70
+ | Agriculture | 200 |
71
+ | Buildings | 18 |
72
+ | Coastal Zone | 71|
73
+ | Cross-Cutting Area | 180 |
74
+ | Disaster Risk Management (DRM) | 85 |
75
+ | Economy-wide | 85 |
76
+ | Education | 23|
77
+ | Energy | 254 |
78
+ | Environment | 91 |
79
+ | Health | 68|
80
+ | Industries | 41 |
81
+ | LULUCF/Forestry | 193|
82
+ | Social Development | 56 |
83
+ | Tourism | 28 |
84
+ | Transport | 107|
85
+ | Urban | 51 |
86
+ | Waste | 59|
87
+ | Water | 106 |
88
 
89
  ## Training procedure
90
 
 
112
  | 0.0892 | 6.0 | 3798 | 0.6073 | 0.6425 | 0.7499 | 0.6545 | 0.7844 | 0.8610 | 0.7844 | 0.7064 | 0.7634 | 0.7113 |
113
  | 0.0721 | 7.0 | 4431 | 0.6114 | 0.6428 | 0.7488 | 0.6519 | 0.7855 | 0.8627 | 0.7855 | 0.7071 | 0.7638 | 0.7109 |
114
 
115
+ |label | precision |recall |f1-score| support|
116
+ |:-------------:|:---------:|:-----:|:------:|:------:|
117
+ | Agriculture | 0.720 | 0.850|0.780|200|
118
+ | Buildings | 0.636 |0.777|0.700|18|
119
+ | Coastal Zone | 0.562|0.760|0.646|71|
120
+ | Cross-Cutting Area | 0.569 |0.777|0.657|180|
121
+ | Disaster Risk Management (DRM) | 0.567 |0.694|0.624|85|
122
+ | Economy-wide | 0.461 |0.635| 0.534|85|
123
+ | Education | 0.608|0.608|0.608|23|
124
+ | Energy | 0.816 |0.838|0.827|254|
125
+ | Environment | 0.561 |0.703|0.624|91|
126
+ | Health | 0.708|0.750|0.728|68|
127
+ | Industries | 0.660 |0.902|0.762|41|
128
+ | LULUCF/Forestry | 0.676|0.844|0.751|193|
129
+ | Social Development | 0.593 | 0.678|0.633|56|
130
+ | Tourism | 0.551 |0.571|0.561|28|
131
+ | Transport | 0.700|0.766|0.732|107|
132
+ | Urban | 0.414 |0.568|0.479|51|
133
+ | Waste | 0.658|0.881|0.753|59|
134
+ | Water | 0.602 |0.773|0.677|106|
135
+
136
+ ### Environmental Impact
137
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
138
+ - **Carbon Emitted**: 0.02867 kg of CO2
139
+ - **Hours Used**: 0.706 hours
140
+
141
+ ### Training Hardware
142
+ - **On Cloud**: yes
143
+ - **GPU Model**: 1 x Tesla T4
144
+ - **CPU Model**: Intel(R) Xeon(R) CPU @ 2.00GHz
145
+ - **RAM Size**: 12.67 GB
146
 
147
  ### Framework versions
148
 
149
  - Transformers 4.38.1
150
  - Pytorch 2.1.0+cu121
151
  - Datasets 2.18.0
152
+ - Tokenizers 0.15.2