jonahank commited on
Commit
7e2b96d
1 Parent(s): 1751af9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Identifying and Analysing political quotes from the Danish Parliament related to climate change using NLP
2
+ **KlimaBERT**, a sequence-classifier fine-tuned to predict whether quotes are climate-related. When predicting the class 1, "climate-related", (positive class), the model achieves a F1-score of 0.97, Precision of 0.97, and Recall of 0.97. The negative class, 0, is defined as "non-climate-related".
3
+
4
+ KlimaBERT is fine-tuned using the pre-trained DaBERT-uncased model, on a training set of 1.000 manually labelled data-points. The training set contains both political quotes and summaries of bills from the [Danish Parliament](https://www.ft.dk/).
5
+
6
+ The model is created to identify political quotes related to climate change, and performs best on official texts from the Danish Parliament.
7
+
8
+ ### Fine-tuning
9
+ To fine-tune a model similar to KlimaBERT, follow the [fine-tuning notebooks](https://github.com/jonahank/Vote-Prediction-Model/tree/main/climate_classifier)
10
+
11
+ ### References
12
+ BERT:
13
+ Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova (2018). Bert: Pre-training of deep
14
+ bidirectional transformers for language understanding.
15
+ https://arxiv.org/abs/1810.04805
16
+
17
+ DaBERT:
18
+ Certainly (2021). Certainly has trained the most advanced danish bert model to date.
19
+ https://www.certainly.io/blog/danish-bert-model/.
20
+
21
+ ### Acknowledgements
22
+ The resources are created through the work of my Master's thesis, so I would like to thank my supervisors [Leon Derczynski](https://www.derczynski.com/itu/) and [Vedran Sekara](https://vedransekara.github.io/) for the great support throughout the project! And a HUGE thanks to [Gustav Gyrst](https://github.com/Gyrst) for great sparring and co-development of the tools you find in this repo.