Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
@@ -9,12 +9,7 @@ pinned: false
|
|
9 |
|
10 |
# Benchmarkusing different techniques
|
11 |
|
12 |
-
##
|
13 |
-
|
14 |
-
### Model Description
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
|
19 |
#### Intended Use
|
20 |
|
@@ -39,6 +34,26 @@ The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
|
|
39 |
6. Proponents are biased
|
40 |
7. Fossil fuels are needed
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
### Performance
|
43 |
|
44 |
#### Metrics (I used NVIDIA T4 small GPU)
|
@@ -48,6 +63,7 @@ The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
|
|
48 |
- Energy consumption tracked in Wh (~1,8wh)
|
49 |
|
50 |
#### Model Architecture
|
|
|
51 |
ML models prefers numeric values so we need to embed our quotes. I used *MTEB Leaderboard* on HuggingFace to find the model with the best trade-off between performance and the number of parameters.
|
52 |
|
53 |
I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.
|
@@ -60,21 +76,36 @@ Then here is the Confusion Matrix :
|
|
60 |
|
61 |

|
62 |
|
63 |
-
### Environmental Impact
|
64 |
-
|
65 |
-
Environmental impact is tracked using CodeCarbon, measuring:
|
66 |
-
- Carbon emissions during inference
|
67 |
-
- Energy consumption during inference
|
68 |
-
|
69 |
-
This tracking helps establish a baseline for the environmental impact of model deployment and inference.
|
70 |
-
|
71 |
### Limitations
|
72 |
- Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
|
73 |
- Hard to go over 70% accuracy with "simple" ML.
|
74 |
- Textual data have some interpretations limitations that little models can't find.
|
75 |
|
76 |
-
### Ethical Considerations
|
77 |
|
78 |
-
|
79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
80 |
```
|
|
|
9 |
|
10 |
# Benchmarkusing different techniques
|
11 |
|
12 |
+
## Global Informations :
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
#### Intended Use
|
15 |
|
|
|
34 |
6. Proponents are biased
|
35 |
7. Fossil fuels are needed
|
36 |
|
37 |
+
### Environmental Impact
|
38 |
+
|
39 |
+
Environmental impact is tracked using CodeCarbon, measuring:
|
40 |
+
- Carbon emissions during inference
|
41 |
+
- Energy consumption during inference
|
42 |
+
|
43 |
+
This tracking helps establish a baseline for the environmental impact of model deployment and inference.
|
44 |
+
|
45 |
+
### Ethical Considerations
|
46 |
+
|
47 |
+
- Dataset contains sensitive topics related to climate disinformation
|
48 |
+
- Environmental impact is tracked to promote awareness of AI's carbon footprint
|
49 |
+
|
50 |
+
|
51 |
+
## ML model for Climate Disinformation Classification
|
52 |
+
|
53 |
+
### Model Description
|
54 |
+
|
55 |
+
Find the best ML model to process vectorized quotes to detect climate change disinformation.
|
56 |
+
|
57 |
### Performance
|
58 |
|
59 |
#### Metrics (I used NVIDIA T4 small GPU)
|
|
|
63 |
- Energy consumption tracked in Wh (~1,8wh)
|
64 |
|
65 |
#### Model Architecture
|
66 |
+
|
67 |
ML models prefers numeric values so we need to embed our quotes. I used *MTEB Leaderboard* on HuggingFace to find the model with the best trade-off between performance and the number of parameters.
|
68 |
|
69 |
I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.
|
|
|
76 |
|
77 |

|
78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
### Limitations
|
80 |
- Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
|
81 |
- Hard to go over 70% accuracy with "simple" ML.
|
82 |
- Textual data have some interpretations limitations that little models can't find.
|
83 |
|
|
|
84 |
|
85 |
+
|
86 |
+
## Bert model for Climate Disinformation Classification
|
87 |
+
|
88 |
+
### Model Description
|
89 |
+
|
90 |
+
Fine tune model for model classification.
|
91 |
+
|
92 |
+
### Performance
|
93 |
+
|
94 |
+
#### Metrics (I used NVIDIA T4 small GPU)
|
95 |
+
- **Accuracy**: ~90%
|
96 |
+
- **Environmental Impact**:
|
97 |
+
- Emissions tracked in gCO2eq (~0,25g)
|
98 |
+
- Energy consumption tracked in Wh (~0.7wh)
|
99 |
+
|
100 |
+
#### Model Architecture
|
101 |
+
|
102 |
+
Fine tuning of "bert-uncased" model with 70% train, 15% eval, 15% test datasets.
|
103 |
+
|
104 |
+
### Limitations
|
105 |
+
- Not optimized. I need to try to run it on CPU
|
106 |
+
- Little models have limitations. Regularly between 70-80% accuracy. Hard to go over just by changing params.
|
107 |
+
|
108 |
+
# Contacts :
|
109 |
+
*LinkedIn* : Mattéo GIRARDEAU
|
110 |
+
*email* : [email protected]
|
111 |
```
|