File size: 8,512 Bytes
55b51f3 5c5e8e0 55b51f3 5c5e8e0 3693df1 5c5e8e0 55b51f3 5c5e8e0 73ed339 5c5e8e0 0783058 471e9d4 0783058 471e9d4 0783058 471e9d4 0783058 471e9d4 0783058 471e9d4 0783058 471e9d4 0ca1b82 5c5e8e0 3693df1 5c5e8e0 3693df1 5c5e8e0 3693df1 5c5e8e0 3693df1 5c5e8e0 0ca1b82 5c5e8e0 73ed339 5c5e8e0 0ca1b82 5c5e8e0 73ed339 5c5e8e0 73ed339 5c5e8e0 73ed339 5c5e8e0 471e9d4 73ed339 5c5e8e0 73ed339 5c5e8e0 73ed339 5c5e8e0 73ed339 5c5e8e0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
language: en
license: cc-by-4.0
datasets:
- multi_nli
- webimmunization/COVID-19-conspiracy-theories-tweets
library_name: transformers
pipeline_tag: text-classification
---
# Model Card for Model COVID-19-CT-tweets-classification
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is a DeBERTa-v3-base-tasksource-nli model with an adapter trained on [More Information Needed], which contains X pairs of a tweet and a conspiracy theory along with class labels: support, deny, neutral. The model was finetuned for text classification to predict whether a tweet supports a given conspiracy theory or not. The model was trained on tweets related to six common COVID-19 conspiracy theories.
1. **CT6: Vaccines are unsafe.** The coronavirus vaccine is either unsafe or part of a larger plot to control people or reduce the population.
2. **CT4: Governments and politicians spread misinformation.** Politicians or government agencies are intentionally spreading false information, or they have some other motive for the way they are responding to the coronavirus.
3. **CT5: The Chinese intentionally spread the virus.** The Chinese government intentionally created or spread the coronavirus to harm other countries.
4. **CT1: Deliberate strategy to create economic instability or benefit large corporations.** The coronavirus or the government's response to it is a deliberate strategy to create economic instability or to benefit large corporations over small businesses.
5. **CT2: Public was intentionally misled about the true nature of the virus and prevention.** The public is being intentionally misled about the true nature of the Coronavirus, its risks, or the efficacy of certain treatments or prevention methods.
6. **CT3: Human made and bioweapon.** The Coronavirus was created intentionally, made by humans, or as a bioweapon.
This model is suitable for English only.
- **Developed by:** Webimmunication Team
- **Shared by [optional]:** @ikrysinska
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** EN
- **License:** CC BY 4.0
- **Finetuned from model [optional]:** https://huggingface.co./sileod/deberta-v3-base-tasksource-nli
### Model Sources
- **Paper:** [More Information Needed]
## Uses
The model was trained to classify a pair of short texts: tweet and conspiracy theory. The model returns a float number which represents the likelihood that the tweet supports a given conspiracy theory.
### Out-of-Scope Use
**Spreading/Generating Tweets that support conspiracy theories:**
This model is explicitly designed for the purpose of classifying and understanding tweets related to COVID-19 conspiracy theories, particularly to determine whether a tweet supports or denies a specific conspiracy theory. It is not intended for, and should not be used to generate or propagate tweets that endorse or support any conspiracy theory. Any use of the model for such purposes is considered unethical and goes against the intended use case.
**Amplifying echo chambers of social subnetworks susceptible to conspiracy theories:**
While the model can help identify tweets that are related to conspiracy theories, it is important to note that it should not be used to target or amplify echo chambers or social subnetworks that are susceptible to believing in conspiracy theories. Ethical use of this model involves promoting responsible and unbiased information dissemination and discourages actions that may contribute to the spread of misinformation or polarization. Users should be cautious about using this model in ways that may further divide communities or promote harmful narratives.
## Bias, Risks, and Limitations
**Results may be distorted for conspiracy theories out of the training dataset:**
This model has been specifically fine-tuned to classify tweets related to a predefined set of COVID-19 conspiracy theories. As a result, its performance may not be as reliable when applied to conspiracy theories or topics that were not included in the training data. Users should exercise caution and consider the potential for distorted results when applying this model to subjects beyond its training scope. The model may not perform well in categorizing or understanding content that falls outside the designated conspiracy theories.
**Unintentional stifling of legitimate public discourse:**
The model's primary purpose is to identify tweets related to COVID-19 conspiracy theories, and it is not intended to stifle legitimate public discourse or eliminate discussions that merely resemble conspiracy theories. There is a risk that using this model inappropriately may lead to the suppression of valid conversations and the removal of content that is not explicitly conspiratorial but might be flagged due to similarities in language or topic. Users should be aware of this limitation and use the model judiciously, ensuring that it does not impede the free exchange of ideas and discussions.
**Bias in decision making:**
Like many machine learning models, this model may exhibit bias in its decision-making process. Factors such as the text style which may represent the socio-economical status of the individuals may inadvertently affect the model's classifications. The model's outputs may not always be entirely free from bias and to use its predictions as supplementary information rather than definitive judgments.
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
The model was finetuned with [webimmunization/COVID-19-CT-tweets-classification and ](https://huggingface.co./webimmunization/COVID-19-CT-tweets-classification) and [mnli](https://huggingface.co./datasets/multi_nli) datasets.
### Training Procedure
The adapter was trained for 5 epochs with a batch size of 16.
#### Preprocessing
The training data was cleaned before the training. All URLs, Twitter user mentions, and non-ASCII characters were removed.
## Evaluation
The model was evaluated on a sample of the tweets collected during the COVID-19 pandemic. All the tweets were rated against each of the six theories by five annotators. Using sliding scales, they rated each tweets' endorsement likelihood for the respective conspiracy theory from 0% to 100%. The consensus among raters was substantial for every conspiracy theory. Comparisons with human evaluations revealed substantial correlations. The model significantly surpasses the performance of the pre-trained model without the finetuned adapter (see table below).
| Conspiracy Theory | Correlations between human raters | Correlation between human ratings and model without adapter | Correlation between human ratings and model with finetuned adapter |
|---|---|---|---|
| **Vaccines are unsafe.** | 0.78 | 0.29 | 0.57 |
| **Governments and politicians spread misinformation.** | 0.58 | 0.32 | 0.72 |
| **The Chinese intentionally spread the virus.** | 0.62 | 0.53 | 0.64 |
| **Deliberate strategy to create economic instability or benefit large corporations.** | 0.56 | 0.33 | 0.54 |
| **Public was intentionally misled about the true nature of the virus and prevention.** | 0.66 | 0.37 | 0.68 |
| **Human made and bioweapon.** | 0.67 | 0.15 | .78 |
## Environmental Impact
Carbon emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** GPU Tesla V100
- **Hours used:** 40
- **Cloud Provider:** Google Cloud Platform
- **Compute Region:** us-east1
- **Carbon Emitted:** 4.44 kg CO2 eq ([equivalent to: 17.9 km driven by an average ICE car, 2.22 kgs of coal burned, 0.07 tree seedlings sequesting carbon for 10 years](https://www.epa.gov/energy/greenhouse-gases-equivalencies-calculator-calculations-and-references)
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## Model Card Authors
@ikrysinska, @wtomi
## Model Card Contact
[email protected]
[email protected]
[email protected]
|