File size: 6,788 Bytes
ff75fb8 1a512c5 1a739d2 1a512c5 ff75fb8 1a512c5 77fcc47 ff75fb8 1a512c5 ff75fb8 a610429 ff75fb8 1a512c5 ff75fb8 1a512c5 ff75fb8 1a512c5 ff75fb8 1a512c5 8b6fdd7 1a512c5 ff75fb8 1a512c5 ff75fb8 1a512c5 ff75fb8 1a512c5 ff75fb8 1a512c5 ff75fb8 1a512c5 ff75fb8 1a512c5 ff75fb8 1a512c5 bb9c267 1a512c5 bb9c267 1a512c5 bb9c267 1a512c5 bb9c267 1a512c5 bb9c267 1a512c5 373b0ef 1a512c5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
---
license: cc-by-sa-4.0
language:
- da
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- partypress
- political science
- parties
- press releases
widget:
- text: 'Forud for NATO-topmødet i dag opfordrer Enhedslisten til, at Danmark siger nej tak til et styrket, bredere NATO. Efter snart halvandet år med en hærgende global pandemi og en tiltagende galoperende klimakrise er det sidste, vi har brug for, mere oprustning og styrkelse af NATO på den globale scene.Derfor skal Danmark ikke bakke op om, at NATOs råderum skal udvides til at adressere relationen mellem USA og Kina, ligesom Danmark ikke skal bakke op om, at NATO får større indflydelse i Arktis. Øget militarisering af de store globale udfordringer, vi står overfor, er den helt forkerte vej at gå.�Udenrigsordfører Eva Flyvholm siger om de bebudede planer:Danmark bør sætte alt ind for at sikre, at Arktis bliver et lavspændingsområde og derfor holde NATO mest muligt ude af samarbejdet omkring Arktis. Vi skal ikke militarisere et så vitalt et område og gøre Arktis til en arena for global stormagtsrivalisering bl.a. ved at isolere Rusland og se stort på de lokale befolkninger.'
---
# PARTYPRESS monolingual Denmark
Fine-tuned model, based on [Maltehb/danish-bert-botxo](https://huggingface.co./Maltehb/danish-bert-botxo). Used in [Erfort et al. (2023)](https://doi.org/10.1177/20531680231183512), building on the PARTYPRESS database. For the downstream task of classyfing press releases from political parties into 23 unique policy areas we achieve a performance comparable to expert human coders.
## Model description
The PARTYPRESS monolingual model builds on [Maltehb/danish-bert-botxo](https://huggingface.co./Maltehb/danish-bert-botxo) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP):
| Code | Issue |
|--|-------|
| 1 | Macroeconomics |
| 2 | Civil Rights |
| 3 | Health |
| 4 | Agriculture |
| 5 | Labor |
| 6 | Education |
| 7 | Environment |
| 8 | Energy |
| 9 | Immigration |
| 10 | Transportation |
| 12 | Law and Crime |
| 13 | Social Welfare |
| 14 | Housing |
| 15 | Domestic Commerce |
| 16 | Defense |
| 17 | Technology |
| 18 | Foreign Trade |
| 19.1 | International Affairs |
| 19.2 | European Union |
| 20 | Government Operations |
| 23 | Culture |
| 98 | Non-thematic |
| 99 | Other |
## Model variations
There are several monolingual models for different countries, and a multilingual model. The multilingual model can be easily extended to other languages, country contexts, or time periods by fine-tuning it with minimal additional labeled texts.
## Intended uses & limitations
The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
The classification can then be used to measure which issues parties are discussing in their communication.
### How to use
This model can be used directly with a pipeline for text classification:
```python
>>> from transformers import pipeline
>>> tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
>>> partypress = pipeline("text-classification", model = "cornelius/partypress-monolingual-denmark", tokenizer = "cornelius/partypress-monolingual-denmark", **tokenizer_kwargs)
>>> partypress("Your text here.")
```
### Limitations and bias
The model was trained with data from parties in Denmark. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database. For example, the performance is highest for press releases from Ireland (75%) and lowest for Poland (55%).
## Training data
The PARTYPRESS multilingual model was fine-tuned with about 3,000 press releases from parties in Denmark. The press releases were labeled by two expert human coders.
For the training data of the underlying model, please refer to [Maltehb/danish-bert-botxo](https://huggingface.co./Maltehb/danish-bert-botxo)
## Training procedure
### Preprocessing
For the preprocessing, please refer to [Maltehb/danish-bert-botxo](https://huggingface.co./Maltehb/danish-bert-botxo)
### Pretraining
For the pretraining, please refer to [Maltehb/danish-bert-botxo](https://huggingface.co./Maltehb/danish-bert-botxo)
### Fine-tuning
We fine-tuned the model using about 3,000 labeled press releases from political parties in Denmark.
#### Training Hyperparameters
The batch size for training was 12, for testing 2, with four epochs. All other hyperparameters were the standard from the transformers library.
#### Framework versions
- Transformers 4.28.0
- TensorFlow 2.12.0
- Datasets 2.12.0
- Tokenizers 0.13.3
## Evaluation results
Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation that are comparable to the performance of our expert human coders. Please refer to Erfort et al. (2023)
### BibTeX entry and citation info
```bibtex
@article{erfort_partypress_2023,
author = {Cornelius Erfort and
Lukas F. Stoetzer and
Heike Klüver},
title = {The PARTYPRESS Database: A new comparative database of parties’ press releases},
journal = {Research and Politics},
volume = {10},
number = {3},
year = {2023},
doi = {10.1177/20531680231183512},
URL = {https://doi.org/10.1177/20531680231183512}
}
```
Erfort, C., Stoetzer, L. F., & Klüver, H. (2023). The PARTYPRESS Database: A new comparative database of parties’ press releases. Research & Politics, 10(3). [https://doi.org/10.1177/20531680231183512](https://doi.org/10.1177/20531680231183512)
### Further resources
Github: [cornelius-erfort/partypress](https://github.com/cornelius-erfort/partypress)
Research and Politics Dataverse: [Replication Data for: The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FOINX7Q)
## Acknowledgements
Research for this contribution is part of the Cluster of Excellence "Contestations of the Liberal Script" (EXC 2055, Project-ID: 390715649), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy. Cornelius Erfort is moreover grateful for generous funding provided by the DFG through the Research Training Group DYNAMICS (GRK 2458/1).
## Contact
Cornelius Erfort
Humboldt-Universität zu Berlin
[corneliuserfort.de](corneliuserfort.de)
|