Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,20 @@ tags:
|
|
6 |
model-index:
|
7 |
- name: CONDITIONAL-multilabel-bge
|
8 |
results: []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -13,7 +27,7 @@ should probably proofread and complete it, then remove this comment. -->
|
|
13 |
|
14 |
# CONDITIONAL-multilabel-bge
|
15 |
|
16 |
-
This model is a fine-tuned version of [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the
|
17 |
It achieves the following results on the evaluation set:
|
18 |
- Loss: 0.5295
|
19 |
- Precision-micro: 0.5138
|
@@ -28,15 +42,31 @@ It achieves the following results on the evaluation set:
|
|
28 |
|
29 |
## Model description
|
30 |
|
31 |
-
|
|
|
|
|
|
|
32 |
|
33 |
## Intended uses & limitations
|
34 |
|
35 |
-
|
|
|
|
|
36 |
|
37 |
## Training and evaluation data
|
38 |
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## Training procedure
|
42 |
|
@@ -63,10 +93,28 @@ The following hyperparameters were used during training:
|
|
63 |
| 0.0298 | 5.0 | 1845 | 0.4971 | 0.5161 | 0.1840 | 0.5184 | 0.7317 | 0.1857 | 0.7317 | 0.6053 | 0.1829 | 0.6058 |
|
64 |
| 0.0152 | 6.0 | 2214 | 0.5295 | 0.5138 | 0.1866 | 0.5169 | 0.7378 | 0.1874 | 0.7378 | 0.6058 | 0.1852 | 0.6065 |
|
65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
|
67 |
### Framework versions
|
68 |
|
69 |
- Transformers 4.38.1
|
70 |
- Pytorch 2.1.0+cu121
|
71 |
- Datasets 2.18.0
|
72 |
-
- Tokenizers 0.15.2
|
|
|
6 |
model-index:
|
7 |
- name: CONDITIONAL-multilabel-bge
|
8 |
results: []
|
9 |
+
datasets:
|
10 |
+
- GIZ/policy_classification
|
11 |
+
library_name: transformers
|
12 |
+
pipeline_tag: text-classification
|
13 |
+
|
14 |
+
co2_eq_emissions:
|
15 |
+
emissions: 28.4522411264774
|
16 |
+
source: codecarbon
|
17 |
+
training_type: fine-tuning
|
18 |
+
on_cloud: true
|
19 |
+
cpu_model: Intel(R) Xeon(R) CPU @ 2.00GHz
|
20 |
+
ram_total_size: 12.6747894287109
|
21 |
+
hours_used: 0.702
|
22 |
+
hardware_used: 1 x Tesla T4
|
23 |
---
|
24 |
|
25 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
27 |
|
28 |
# CONDITIONAL-multilabel-bge
|
29 |
|
30 |
+
This model is a fine-tuned version of [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the [Policy-Classification](https://huggingface.co/datasets/GIZ/policy_classification) dataset.
|
31 |
It achieves the following results on the evaluation set:
|
32 |
- Loss: 0.5295
|
33 |
- Precision-micro: 0.5138
|
|
|
42 |
|
43 |
## Model description
|
44 |
|
45 |
+
The purpose of this model is to predict multiple labels simultaneously from a given input data. Specifically, the model will predict 2 labels -
|
46 |
+
ConditionalLabel, UnconditionalLabel - that are relevant to a particular task or application
|
47 |
+
- **Conditional**: In context of climate policy documents if certain Target/Action/Plan/Policy commitment is being made conditionally.
|
48 |
+
- **Unconditional**: In context of climate policy documents if certain Target/Action/Plan/Policy commitment is being made unconditionally.
|
49 |
|
50 |
## Intended uses & limitations
|
51 |
|
52 |
+
The dataset sometimes does not include the sub-heading/heading which indicates that the paragraph belongs to Conditional/Unconditional category.
|
53 |
+
But has been copied from the relevant document from those sub-headings. This makes the assessment of Conditonality very difficult. Annotator when given only the paragraph without
|
54 |
+
the full long context had a difficulty in assessing the conditionality of commitments being made in paragraph.
|
55 |
|
56 |
## Training and evaluation data
|
57 |
|
58 |
+
- Training Dataset: 5901
|
59 |
+
| Class | Positive Count of Class|
|
60 |
+
|:-------------|:--------|
|
61 |
+
| ConditionalLabel | 1986 |
|
62 |
+
| UnconditionalLabel | 1312 |
|
63 |
+
|
64 |
+
|
65 |
+
- Validation Dataset: 1190
|
66 |
+
| Class | Positive Count of Class|
|
67 |
+
|:-------------|:--------|
|
68 |
+
| ConditionalLabel | 192 |
|
69 |
+
| UnconditionalLabel | 136 |
|
70 |
|
71 |
## Training procedure
|
72 |
|
|
|
93 |
| 0.0298 | 5.0 | 1845 | 0.4971 | 0.5161 | 0.1840 | 0.5184 | 0.7317 | 0.1857 | 0.7317 | 0.6053 | 0.1829 | 0.6058 |
|
94 |
| 0.0152 | 6.0 | 2214 | 0.5295 | 0.5138 | 0.1866 | 0.5169 | 0.7378 | 0.1874 | 0.7378 | 0.6058 | 0.1852 | 0.6065 |
|
95 |
|
96 |
+
|label | precision |recall |f1-score| support|
|
97 |
+
|:-------------:|:---------:|:-----:|:------:|:------:|
|
98 |
+
|ConditionalLabel |0.490 |0.760 |0.595 | 192.0 |
|
99 |
+
|UnconditionalLabel |0.555 |0.706 | 0.621 | 136.0 |
|
100 |
+
|
|
101 |
+
|
102 |
+
### Environmental Impact
|
103 |
+
Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
|
104 |
+
- **Carbon Emitted**: 0.02845 kg of CO2
|
105 |
+
- **Hours Used**: 0.702 hours
|
106 |
+
|
107 |
+
### Training Hardware
|
108 |
+
- **On Cloud**: yes
|
109 |
+
- **GPU Model**: 1 x Tesla T4
|
110 |
+
- **CPU Model**: Intel(R) Xeon(R) CPU @ 2.00GHz
|
111 |
+
- **RAM Size**: 12.67 GB
|
112 |
+
|
113 |
+
|
114 |
|
115 |
### Framework versions
|
116 |
|
117 |
- Transformers 4.38.1
|
118 |
- Pytorch 2.1.0+cu121
|
119 |
- Datasets 2.18.0
|
120 |
+
- Tokenizers 0.15.2
|