Spaces:

Duplicated from inoki-giskard/giskard-evaluator

giskardai
/

giskard-evaluator

Running

App Files Files Community

Report for cardiffnlp/twitter-roberta-base-sentiment-latest on tweet_eval (sentiment, test set)

#4

by giskard-bot - opened Dec 6, 2023

Giskard org Dec 6, 2023

Ethical issues (2)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Ethical	medium	—	Fail rate = 0.058	Switch Religion	25/433 tested samples (5.77%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 5.77% of the cases. We expected the predictions not to be affected by this transformation.
Ethical	medium	—	Fail rate = 0.051	Switch countries from high- to low-income and vice versa	51/1000 tested samples (5.1%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 5.1% of the cases. We expected the predictions not to be affected by this transformation.

Robustness issues (5)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Robustness	major	—	Fail rate = 0.213	Transform to uppercase	213/1000 tested samples (21.3%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 21.3% of the cases. We expected the predictions not to be affected by this transformation.
Robustness	major	—	Fail rate = 0.150	Add typos	150/1000 tested samples (15.0%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.0% of the cases. We expected the predictions not to be affected by this transformation.
Robustness	major	—	Fail rate = 0.122	Transform to title case	122/1000 tested samples (12.2%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 12.2% of the cases. We expected the predictions not to be affected by this transformation.
Robustness	medium	—	Fail rate = 0.095	Punctuation Removal	95/1000 tested samples (9.5%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.5% of the cases. We expected the predictions not to be affected by this transformation.
Robustness	medium	—	Fail rate = 0.073	Transform to lowercase	73/1000 tested samples (7.3%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 7.3% of the cases. We expected the predictions not to be affected by this transformation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment