|
--- |
|
language: en |
|
license: cc-by-4.0 |
|
datasets: |
|
- multi_nli |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Model Card for Model COVID-19-CT-tweets-classification |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
This is a DeBERTa-v3-base-tasksource-nli model with an adapter trained on [More Information Needed], which contains X pairs of a tweet and a conspiracy theory along with class labels: support, deny, neutral. The model was finetuned for text classification to predict whether a tweet supports a given conspiracy theory or not. The model was trained on tweets related to six common COVID-19 conspiracy theories. |
|
|
|
1. **CT6: Vaccines are unsafe.** The coronavirus vaccine is either unsafe or part of a larger plot to control people or reduce the population. |
|
|
|
2. **CT4: Governments and politicians spread misinformation.** Politicians or government agencies are intentionally spreading false information, or they have some other motive for the way they are responding to the coronavirus. |
|
|
|
3. **CT5: The Chinese intentionally spread the virus.** The Chinese government intentionally created or spread the coronavirus to harm other countries. |
|
|
|
4. **CT1: Deliberate strategy to create economic instability or benefit large corporations.** The coronavirus or the government's response to it is a deliberate strategy to create economic instability or to benefit large corporations over small businesses. |
|
|
|
5. **CT2: Public was intentionally misled about the true nature of the virus and prevention.** The public is being intentionally misled about the true nature of the Coronavirus, its risks, or the efficacy of certain treatments or prevention methods. |
|
|
|
6. **CT3: Human made and bioweapon.** The Coronavirus was created intentionally, made by humans, or as a bioweapon. |
|
|
|
|
|
This model is suitable for English only. |
|
|
|
- **Developed by:** Webimmunication Team |
|
- **Shared by [optional]:** @ikrysinska |
|
- **Model type:** [More Information Needed] |
|
- **Language(s) (NLP):** EN |
|
- **License:** CC BY 4.0 |
|
- **Finetuned from model [optional]:** https://huggingface.co./sileod/deberta-v3-base-tasksource-nli |
|
|
|
### Model Sources |
|
|
|
- **Paper:** [More Information Needed] |
|
|
|
- ## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
[More Information Needed] |
|
|
|
### Downstream Use [optional] |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
|
[More Information Needed] |
|
|
|
### Out-of-Scope Use |
|
|
|
- spreading/generating tweets that support a given conspiracy theory |
|
- amplifying echo chambers of social subnetworks susceptible to believe in conspiracy theories |
|
|
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
[More Information Needed] |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- results are distorted for the conspiracy theories out of the training dataset |
|
- unintentional stifling of legitimate public discourse (elimination of discussion that resembles conspiracy theories from social subnetworks) |
|
- bias: text style, economic status... |
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
[More Information Needed] |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
[More Information Needed] |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
[More Information Needed] |
|
|
|
### Training Procedure |
|
|
|
The adapter was trained for 5 epochs with a batch size of 16. |
|
|
|
#### Preprocessing |
|
|
|
The training data was cleaned before the training. All URLs, Twitter user mentions, and non-ASCII characters were removed. |
|
|
|
## Evaluation |
|
|
|
The model was evaluated on a sample of the tweets collected during the COVID-19 pandemic. All the tweets were rated against each of the six theories by five annotators. Using sliding scales, they rated each tweets' endorsement likelihood for the respective conspiracy theory from 0% to 100%. The consensus among raters was substantial for every conspiracy theory. Comparisons with human evaluations revealed substantial correlations. The model significantly surpasses the performance of the pre-trained model without the finetuned adapter (see table below). |
|
|
|
|
|
| Conspiracy Theory | Correlations between human raters | Correlation between human ratings and model without adapter | Correlation between human ratings and model with finetuned adapter | |
|
|---|---|---|---| |
|
| **Vaccines are unsafe.** | 0.78 | 0.29 | 0.57 | |
|
| **Governments and politicians spread misinformation.** | 0.58 | 0.32 | 0.72 | |
|
| **The Chinese intentionally spread the virus.** | 0.62 | 0.53 | 0.64 | |
|
| **Deliberate strategy to create economic instability or benefit large corporations.** | 0.56 | 0.33 | 0.54 | |
|
| **Public was intentionally misled about the true nature of the virus and prevention.** | 0.66 | 0.37 | 0.68 | |
|
| **Human made and bioweapon.** | 0.67 | 0.15 | .78 | |
|
|
|
|
|
|
|
## Environmental Impact |
|
|
|
Carbon emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** GPU Tesla V100 |
|
- **Hours used:** 40 |
|
- **Cloud Provider:** Google Cloud Platform |
|
- **Compute Region:** us-east1 |
|
- **Carbon Emitted:** 4.44 kg CO2 eq ([equivalent to: 17.9 km driven by an average ICE car, 2.22 kgs of coal burned, 0.07 tree seedlings sequesting carbon for 10 years](https://www.epa.gov/energy/greenhouse-gases-equivalencies-calculator-calculations-and-references) |
|
|
|
|
|
## Citation [optional] |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|
|
**APA:** |
|
|
|
[More Information Needed] |
|
|
|
## Glossary [optional] |
|
|
|
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> |
|
|
|
[More Information Needed] |
|
|
|
|
|
## Model Card Authors |
|
|
|
@ikrysinska, @wtomi |
|
|
|
## Model Card Contact |
|
|
|
[email protected] |
|
|
|
[email protected] |
|
|
|
[email protected] |
|
|