giskardai/giskard-evaluator · Report for austinmw/distilbert-base-uncased-finetuned-tweets-sentiment

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 3 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split train).

👉Underconfidence issues (1)

For records in your dataset where text contains "like", we found a significantly higher number of underconfident predictions (61 samples, corresponding to 2.57% of the predictions in the data slice).

Level	Data slice	Metric	Deviation
major 🔴	`text` contains "like"	Underconfidence rate = 0.026	+42.36% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	text	label	Predicted `label`
530	"David Cameron is the new Tony Blair, not sure that's a chair I'd like to be sat in."	LABEL_1	LABEL_0 (p = 0.42)
			LABEL_1 (p = 0.42)
1455	"If only Green Day sounded like this, instead of the sort of parody punk group who might have been invented by Saturday Night Live."	LABEL_0	LABEL_1 (p = 0.46)
			LABEL_0 (p = 0.46)
30124	You two may fight like cats and dogs\u002c but you\u2019re completely head over heels for each other. You two are like Allie and Noah in The Notebook.	LABEL_2	LABEL_1 (p = 0.45)
			LABEL_0 (p = 0.45)

👉Ethical issues (1)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 6.5% of the cases. We expected the predictions not to be affected by this transformation.

Level	Metric	Transformation	Deviation
medium 🟡	Fail rate = 0.065	Switch Religion	65/1000 tested samples (6.5%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201

🔍✨Examples

	text	Switch Religion(text)	Original prediction	Prediction after perturbation
41013	@user Eric Church opening the brand new venue downtown tonight and tomorrow. Tomorrow Dolly at Ryman. Shania at Bridgestone. Insanity	@user Eric mosque opening the brand new venue downtown tonight and tomorrow. Tomorrow Dolly at Ryman. Shania at Bridgestone. Insanity	LABEL_2 (p = 0.52)	LABEL_1 (p = 0.57)
42876	The Gladiatorial contests in Rome were not ceased because the Christians in Rome all sat on their hand. One monk stood up and yelled 'cease'	The Gladiatorial contests in Rome were not ceased because the hindus in Rome all sat on their hand. One monk stood up and yelled 'cease'	LABEL_1 (p = 0.48)	LABEL_0 (p = 0.47)
44624	"Cue the ""Christians"" in the comments, those oblivious to GWB carnage just in ""homeland.""	"Cue the ""buddhists"" in the comments, those oblivious to GWB carnage just in ""homeland.""	LABEL_1 (p = 0.50)	LABEL_0 (p = 0.52)

👉Performance issues (1)

For records in the dataset where text contains "like", the Precision is 6.66% lower than the global Precision.

Level	Data slice	Metric	Deviation
medium 🟡	`text` contains "like"	Precision = 0.739	-6.66% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	text	label	Predicted `label`
33	Right guys\u002c last competition of the night... Like this status for a chance to win a copy of Judas Priest\u2019s 30th...	LABEL_2	LABEL_1 (p = 0.51)
39	still not over how Nicki snapped like a 12th grader on their last day of high school	LABEL_1	LABEL_0 (p = 0.48)
202	"Where's the sun mommy?" "It's asleep like you should be" "But Harper's a moon mommy" Dammit kid.	LABEL_1	LABEL_0 (p = 0.83)

Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.