File size: 3,099 Bytes
3d84b26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
license: apache-2.0
language:
- de
base_model:
- dbmdz/bert-base-german-uncased
pipeline_tag: text-classification
---

## Social Media Style Classifier for Climate Change Text (German)


This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text about Climate Change is written in a social media style. 

Social media texts were gathered from [GerCCT](https://github.com/RobinSchaefer/GerCCT) and [r/Klimawandel](https://www.reddit.com/r/Klimawandel/).

Non-social media texts were gathered by tokenizing sentences from 15 Wikipedia articles: 
1. [Klimawandel](https://de.wikipedia.org/wiki/Klimawandel),
2. [Globale Erwärmung](https://de.wikipedia.org/wiki/Globale_Erw%C3%A4rmung),
3. [Forschungsgeschichte des Klimawandels](https://de.wikipedia.org/wiki/Forschungsgeschichte_des_Klimawandels),
4. [Klimahysterie](https://de.wikipedia.org/wiki/Klimahysterie),
5. [Klimawandelleugnung](https://de.wikipedia.org/wiki/Klimawandelleugnung),
6. [Folgen der globalen Erwärmung in der Arktis](https://de.wikipedia.org/wiki/Folgen_der_globalen_Erw%C3%A4rmung_in_der_Arktis)
7. [Folgen der globalen Erwärmung](https://de.wikipedia.org/wiki/Folgen_der_globalen_Erw%C3%A4rmung)
8. [Klimamodell](https://de.wikipedia.org/wiki/Klimamodell)
9. [Anpassung an die globale Erwärmung](https://de.wikipedia.org/wiki/Anpassung_an_die_globale_Erw%C3%A4rmung)
10. [Kontroverse um die globale Erwärmung](https://de.wikipedia.org/wiki/Kontroverse_um_die_globale_Erw%C3%A4rmung)
11. [UN-Klimakonferenz in Dubai 2023](https://de.wikipedia.org/wiki/UN-Klimakonferenz_in_Dubai_2023)
12. [Umweltbewegung](https://de.wikipedia.org/wiki/Umweltbewegung#Klimaschutz)
13. [Treibhausgas](https://de.wikipedia.org/wiki/Treibhausgas)
14. [Treibhauseffekt](https://de.wikipedia.org/wiki/Treibhauseffekt)
15. [Klimaschutz](https://de.wikipedia.org/wiki/Klimaschutz)

The dataset contained about 8K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing.
The V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.

The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets. 

### How to use

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

model_name = "rabuahmad/cc-tweets-classifier-de"

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)

classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)

text = "Gestern war ein schöner Tag!"

result = classifier(text)

```
Label 1 indicates that the text is predicted to be a tweet. 

### Evaluation 

Evaluation results on the test set: 

| Metric   |Score      |
|----------|-----------|
| Accuracy | 0.96494   |
| Precision| 0.97552   |
| Recall   | 0.95564   |
| F1       | 0.96547   |