Oriaz commited on
Commit
936ae04
·
verified ·
1 Parent(s): 9b63e69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -17
README.md CHANGED
@@ -9,12 +9,7 @@ pinned: false
9
 
10
  # Benchmarkusing different techniques
11
 
12
- ## ML model for Climate Disinformation Classification
13
-
14
- ### Model Description
15
-
16
-
17
-
18
 
19
  #### Intended Use
20
 
@@ -39,6 +34,26 @@ The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
39
  6. Proponents are biased
40
  7. Fossil fuels are needed
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ### Performance
43
 
44
  #### Metrics (I used NVIDIA T4 small GPU)
@@ -48,6 +63,7 @@ The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
48
  - Energy consumption tracked in Wh (~1,8wh)
49
 
50
  #### Model Architecture
 
51
  ML models prefers numeric values so we need to embed our quotes. I used *MTEB Leaderboard* on HuggingFace to find the model with the best trade-off between performance and the number of parameters.
52
 
53
  I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.
@@ -60,21 +76,36 @@ Then here is the Confusion Matrix :
60
 
61
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66169e1ce557753f30eab31b/tfAcfFu3Cnc9XJ00ixrWB.png)
62
 
63
- ### Environmental Impact
64
-
65
- Environmental impact is tracked using CodeCarbon, measuring:
66
- - Carbon emissions during inference
67
- - Energy consumption during inference
68
-
69
- This tracking helps establish a baseline for the environmental impact of model deployment and inference.
70
-
71
  ### Limitations
72
  - Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
73
  - Hard to go over 70% accuracy with "simple" ML.
74
  - Textual data have some interpretations limitations that little models can't find.
75
 
76
- ### Ethical Considerations
77
 
78
- - Dataset contains sensitive topics related to climate disinformation
79
- - Environmental impact is tracked to promote awareness of AI's carbon footprint
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  ```
 
9
 
10
  # Benchmarkusing different techniques
11
 
12
+ ## Global Informations :
 
 
 
 
 
13
 
14
  #### Intended Use
15
 
 
34
  6. Proponents are biased
35
  7. Fossil fuels are needed
36
 
37
+ ### Environmental Impact
38
+
39
+ Environmental impact is tracked using CodeCarbon, measuring:
40
+ - Carbon emissions during inference
41
+ - Energy consumption during inference
42
+
43
+ This tracking helps establish a baseline for the environmental impact of model deployment and inference.
44
+
45
+ ### Ethical Considerations
46
+
47
+ - Dataset contains sensitive topics related to climate disinformation
48
+ - Environmental impact is tracked to promote awareness of AI's carbon footprint
49
+
50
+
51
+ ## ML model for Climate Disinformation Classification
52
+
53
+ ### Model Description
54
+
55
+ Find the best ML model to process vectorized quotes to detect climate change disinformation.
56
+
57
  ### Performance
58
 
59
  #### Metrics (I used NVIDIA T4 small GPU)
 
63
  - Energy consumption tracked in Wh (~1,8wh)
64
 
65
  #### Model Architecture
66
+
67
  ML models prefers numeric values so we need to embed our quotes. I used *MTEB Leaderboard* on HuggingFace to find the model with the best trade-off between performance and the number of parameters.
68
 
69
  I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.
 
76
 
77
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66169e1ce557753f30eab31b/tfAcfFu3Cnc9XJ00ixrWB.png)
78
 
 
 
 
 
 
 
 
 
79
  ### Limitations
80
  - Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
81
  - Hard to go over 70% accuracy with "simple" ML.
82
  - Textual data have some interpretations limitations that little models can't find.
83
 
 
84
 
85
+
86
+ ## Bert model for Climate Disinformation Classification
87
+
88
+ ### Model Description
89
+
90
+ Fine tune model for model classification.
91
+
92
+ ### Performance
93
+
94
+ #### Metrics (I used NVIDIA T4 small GPU)
95
+ - **Accuracy**: ~90%
96
+ - **Environmental Impact**:
97
+ - Emissions tracked in gCO2eq (~0,25g)
98
+ - Energy consumption tracked in Wh (~0.7wh)
99
+
100
+ #### Model Architecture
101
+
102
+ Fine tuning of "bert-uncased" model with 70% train, 15% eval, 15% test datasets.
103
+
104
+ ### Limitations
105
+ - Not optimized. I need to try to run it on CPU
106
+ - Little models have limitations. Regularly between 70-80% accuracy. Hard to go over just by changing params.
107
+
108
+ # Contacts :
109
+ *LinkedIn* : Mattéo GIRARDEAU
110
+ *email* : [email protected]
111
  ```