text-Oriaz

Runtime error

App Files Files Community

Oriaz commited on Jan 8

Commit

936ae04

verified ·

1 Parent(s): 9b63e69

Update README.md

Browse files

Files changed (1) hide show

README.md +48 -17

README.md CHANGED Viewed

@@ -9,12 +9,7 @@ pinned: false
 # Benchmarkusing different techniques
-## ML model for Climate Disinformation Classification
-### Model Description
 #### Intended Use
@@ -39,6 +34,26 @@ The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
 6. Proponents are biased
 7. Fossil fuels are needed
 ### Performance
 #### Metrics (I used NVIDIA T4 small GPU)
@@ -48,6 +63,7 @@ The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
   - Energy consumption tracked in Wh (~1,8wh)
 #### Model Architecture
 ML models prefers numeric values so we need to embed our quotes. I used *MTEB Leaderboard* on HuggingFace to find the model with the best trade-off between performance and the number of parameters.
 I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.
@@ -60,21 +76,36 @@ Then here is the Confusion Matrix :
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66169e1ce557753f30eab31b/tfAcfFu3Cnc9XJ00ixrWB.png)
-### Environmental Impact
-Environmental impact is tracked using CodeCarbon, measuring:
-- Carbon emissions during inference
-- Energy consumption during inference
-This tracking helps establish a baseline for the environmental impact of model deployment and inference.
 ### Limitations
 - Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
 - Hard to go over 70% accuracy with "simple" ML.
 - Textual data have some interpretations limitations that little models can't find.
-### Ethical Considerations
-- Dataset contains sensitive topics related to climate disinformation
-- Environmental impact is tracked to promote awareness of AI's carbon footprint
 ```

 # Benchmarkusing different techniques
+## Global Informations :
 #### Intended Use
 6. Proponents are biased
 7. Fossil fuels are needed
+### Environmental Impact
+Environmental impact is tracked using CodeCarbon, measuring:
+- Carbon emissions during inference
+- Energy consumption during inference
+This tracking helps establish a baseline for the environmental impact of model deployment and inference.
+### Ethical Considerations
+- Dataset contains sensitive topics related to climate disinformation
+- Environmental impact is tracked to promote awareness of AI's carbon footprint
+## ML model for Climate Disinformation Classification
+### Model Description
+Find the best ML model to process vectorized quotes to detect climate change disinformation.
 ### Performance
 #### Metrics (I used NVIDIA T4 small GPU)
   - Energy consumption tracked in Wh (~1,8wh)
 #### Model Architecture
 ML models prefers numeric values so we need to embed our quotes. I used *MTEB Leaderboard* on HuggingFace to find the model with the best trade-off between performance and the number of parameters.
 I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66169e1ce557753f30eab31b/tfAcfFu3Cnc9XJ00ixrWB.png)
 ### Limitations
 - Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
 - Hard to go over 70% accuracy with "simple" ML.
 - Textual data have some interpretations limitations that little models can't find.
+## Bert model for Climate Disinformation Classification
+### Model Description
+Fine tune model for model classification.
+### Performance
+#### Metrics (I used NVIDIA T4 small GPU)
+- **Accuracy**: ~90%
+- **Environmental Impact**:
+  - Emissions tracked in gCO2eq (~0,25g)
+  - Energy consumption tracked in Wh (~0.7wh)
+#### Model Architecture
+Fine tuning of "bert-uncased" model with 70% train, 15% eval, 15% test datasets.
+### Limitations
+- Not optimized. I need to try to run it on CPU
+- Little models have limitations. Regularly between 70-80% accuracy. Hard to go over just by changing params.
+# Contacts :
+*LinkedIn* : Mattéo GIRARDEAU
+*email* : [email protected]
 ```