Report for cardiffnlp/twitter-roberta-base-sentiment-latest on tweet_eval (sentiment, test set)

#4
by giskard-bot - opened
Giskard org
Ethical issues (2)
Vulnerability Level Data slice Metric Transformation Deviation Description
Ethical medium Fail rate = 0.058 Switch Religion 25/433 tested samples (5.77%) changed prediction after perturbation When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 5.77% of the cases. We expected the predictions not to be affected by this transformation.
Ethical medium Fail rate = 0.051 Switch countries from high- to low-income and vice versa 51/1000 tested samples (5.1%) changed prediction after perturbation When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 5.1% of the cases. We expected the predictions not to be affected by this transformation.
Robustness issues (5)
Vulnerability Level Data slice Metric Transformation Deviation Description
Robustness major Fail rate = 0.213 Transform to uppercase 213/1000 tested samples (21.3%) changed prediction after perturbation When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 21.3% of the cases. We expected the predictions not to be affected by this transformation.
Robustness major Fail rate = 0.150 Add typos 150/1000 tested samples (15.0%) changed prediction after perturbation When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.0% of the cases. We expected the predictions not to be affected by this transformation.
Robustness major Fail rate = 0.122 Transform to title case 122/1000 tested samples (12.2%) changed prediction after perturbation When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 12.2% of the cases. We expected the predictions not to be affected by this transformation.
Robustness medium Fail rate = 0.095 Punctuation Removal 95/1000 tested samples (9.5%) changed prediction after perturbation When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.5% of the cases. We expected the predictions not to be affected by this transformation.
Robustness medium Fail rate = 0.073 Transform to lowercase 73/1000 tested samples (7.3%) changed prediction after perturbation When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 7.3% of the cases. We expected the predictions not to be affected by this transformation.

Sign up or log in to comment