Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,67 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- de
|
5 |
+
base_model:
|
6 |
+
- dbmdz/bert-base-german-uncased
|
7 |
+
pipeline_tag: text-classification
|
8 |
+
---
|
9 |
+
|
10 |
+
## Social Media Style Classifier for Climate Change Text (German)
|
11 |
+
|
12 |
+
|
13 |
+
This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text about Climate Change is written in a social media style.
|
14 |
+
|
15 |
+
Social media texts were gathered from [GerCCT](https://github.com/RobinSchaefer/GerCCT) and [r/Klimawandel](https://www.reddit.com/r/Klimawandel/).
|
16 |
+
|
17 |
+
Non-social media texts were gathered by tokenizing sentences from 15 Wikipedia articles:
|
18 |
+
1. [Klimawandel](https://de.wikipedia.org/wiki/Klimawandel),
|
19 |
+
2. [Globale Erwärmung](https://de.wikipedia.org/wiki/Globale_Erw%C3%A4rmung),
|
20 |
+
3. [Forschungsgeschichte des Klimawandels](https://de.wikipedia.org/wiki/Forschungsgeschichte_des_Klimawandels),
|
21 |
+
4. [Klimahysterie](https://de.wikipedia.org/wiki/Klimahysterie),
|
22 |
+
5. [Klimawandelleugnung](https://de.wikipedia.org/wiki/Klimawandelleugnung),
|
23 |
+
6. [Folgen der globalen Erwärmung in der Arktis](https://de.wikipedia.org/wiki/Folgen_der_globalen_Erw%C3%A4rmung_in_der_Arktis)
|
24 |
+
7. [Folgen der globalen Erwärmung](https://de.wikipedia.org/wiki/Folgen_der_globalen_Erw%C3%A4rmung)
|
25 |
+
8. [Klimamodell](https://de.wikipedia.org/wiki/Klimamodell)
|
26 |
+
9. [Anpassung an die globale Erwärmung](https://de.wikipedia.org/wiki/Anpassung_an_die_globale_Erw%C3%A4rmung)
|
27 |
+
10. [Kontroverse um die globale Erwärmung](https://de.wikipedia.org/wiki/Kontroverse_um_die_globale_Erw%C3%A4rmung)
|
28 |
+
11. [UN-Klimakonferenz in Dubai 2023](https://de.wikipedia.org/wiki/UN-Klimakonferenz_in_Dubai_2023)
|
29 |
+
12. [Umweltbewegung](https://de.wikipedia.org/wiki/Umweltbewegung#Klimaschutz)
|
30 |
+
13. [Treibhausgas](https://de.wikipedia.org/wiki/Treibhausgas)
|
31 |
+
14. [Treibhauseffekt](https://de.wikipedia.org/wiki/Treibhauseffekt)
|
32 |
+
15. [Klimaschutz](https://de.wikipedia.org/wiki/Klimaschutz)
|
33 |
+
|
34 |
+
The dataset contained about 8K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing.
|
35 |
+
The V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.
|
36 |
+
|
37 |
+
The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.
|
38 |
+
|
39 |
+
### How to use
|
40 |
+
|
41 |
+
```python
|
42 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
|
43 |
+
|
44 |
+
model_name = "rabuahmad/cc-tweets-classifier-de"
|
45 |
+
|
46 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
47 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)
|
48 |
+
|
49 |
+
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)
|
50 |
+
|
51 |
+
text = "Gestern war ein schöner Tag!"
|
52 |
+
|
53 |
+
result = classifier(text)
|
54 |
+
|
55 |
+
```
|
56 |
+
Label 1 indicates that the text is predicted to be a tweet.
|
57 |
+
|
58 |
+
### Evaluation
|
59 |
+
|
60 |
+
Evaluation results on the test set:
|
61 |
+
|
62 |
+
| Metric |Score |
|
63 |
+
|----------|-----------|
|
64 |
+
| Accuracy | 0.96494 |
|
65 |
+
| Precision| 0.97552 |
|
66 |
+
| Recall | 0.95564 |
|
67 |
+
| F1 | 0.96547 |
|