Spaces:
Running
Running
Report for cardiffnlp/twitter-roberta-base-sentiment-latest on tweet_eval (sentiment, test set)
#4
by
giskard-bot
- opened
Ethical issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation | Description |
---|---|---|---|---|---|---|
Ethical | medium | — | Fail rate = 0.058 | Switch Religion | 25/433 tested samples (5.77%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 5.77% of the cases. We expected the predictions not to be affected by this transformation. |
Ethical | medium | — | Fail rate = 0.051 | Switch countries from high- to low-income and vice versa | 51/1000 tested samples (5.1%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 5.1% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness issues (5)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation | Description |
---|---|---|---|---|---|---|
Robustness | major | — | Fail rate = 0.213 | Transform to uppercase | 213/1000 tested samples (21.3%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 21.3% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness | major | — | Fail rate = 0.150 | Add typos | 150/1000 tested samples (15.0%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.0% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness | major | — | Fail rate = 0.122 | Transform to title case | 122/1000 tested samples (12.2%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 12.2% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness | medium | — | Fail rate = 0.095 | Punctuation Removal | 95/1000 tested samples (9.5%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.5% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness | medium | — | Fail rate = 0.073 | Transform to lowercase | 73/1000 tested samples (7.3%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 7.3% of the cases. We expected the predictions not to be affected by this transformation. |