inoki-giskard/scan-report-temp · Report for AdamCodd/distilbert-base-uncased-finetuned-sentiment-amazon

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 12 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Overconfidence issues (2)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Overconfidence	major 🔴	`avg_word_length(text)` >= 4.481	Overconfidence rate = 0.804	—	+28.70% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` >= 4.481, we found a significantly higher number of overconfident wrong predictions (37 samples, corresponding to 80.43478260869566% of the wrong predictions in the data slice).

	text	avg_word_length(text)	label	Predicted `label`
95	this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms .	4.61538	negative	positive (p = 1.00)
				negative (p = 0.00)
643	the jabs it employs are short , carefully placed and dead-center .	4.58333	positive	negative (p = 1.00)
				positive (p = 0.00)
218	all that 's missing is the spontaneity , originality and delight .	4.58333	negative	positive (p = 0.99)
				negative (p = 0.01)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Overconfidence	major 🔴	`avg_whitespace(text)` < 0.182	Overconfidence rate = 0.804	—	+28.70% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` < 0.182, we found a significantly higher number of overconfident wrong predictions (37 samples, corresponding to 80.43478260869566% of the wrong predictions in the data slice).

	text	avg_whitespace(text)	label	Predicted `label`
95	this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms .	0.178082	negative	positive (p = 1.00)
				negative (p = 0.00)
643	the jabs it employs are short , carefully placed and dead-center .	0.179104	positive	negative (p = 1.00)
				positive (p = 0.00)
218	all that 's missing is the spontaneity , originality and delight .	0.179104	negative	positive (p = 0.99)
				negative (p = 0.01)

👉Ethical issues (1)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Ethical	medium 🟡	—	Fail rate = 0.057	Switch countries from high- to low-income and vice versa	2/35 tested samples (5.71%) changed prediction after perturbation

🔍✨Examples

When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 5.71% of the cases. We expected the predictions not to be affected by this transformation.

	text	Switch countries from high- to low-income and vice versa(text)	Original prediction	Prediction after perturbation
149	the volatile dynamics of female friendship is the subject of this unhurried , low-key film that is so off-hollywood that it seems positively french in its rhythms and resonance .	the volatile dynamics of female friendship is the subject of this unhurried , low-key film that is so off-hollywood that it seems positively South Sudanese in its rhythms and resonance .	positive (p = 0.85)	negative (p = 0.55)
236	not since japanese filmmaker akira kurosawa 's ran have the savagery of combat and the specter of death been visualized with such operatic grandeur .	not since Malagasy filmmaker akira kurosawa 's ran have the savagery of combat and the specter of death been visualized with such operatic grandeur .	negative (p = 0.51)	positive (p = 0.79)

👉Robustness issues (1)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Robustness	medium 🟡	—	Fail rate = 0.099	Add typos	80/811 tested samples (9.86%) changed prediction after perturbation

🔍✨Examples

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 9.86% of the cases. We expected the predictions not to be affected by this transformation.

	text	Add typos(text)	Original prediction	Prediction after perturbation
1	unflinchingly bleak and desperate	unflinchingly bleak nd desperate	positive (p = 0.86)	negative (p = 0.70)
20	pumpkin takes an admirable look at the hypocrisy of political correctness , but it does so with such an uneven tone that you never know when humor ends and tragedy begins .	pumpkin takes an admirable lokok at the hypocriwy of politicql correcness , but it does so with such an uneven tone that you never know when humor ends and tragedy begins .	negative (p = 0.62)	positive (p = 0.58)
21	the iditarod lasts for days - this just felt like it did .	the iditarod ladts or days - this jus felt like it did .	negative (p = 0.50)	positive (p = 0.83)

👉Performance issues (8)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`idx` >= 63.500 AND `idx` < 115.500	Accuracy = 0.750	—	-14.84% than global

🔍✨Examples

For records in the dataset where `idx` >= 63.500 AND `idx` < 115.500, the Accuracy is 14.84% lower than the global Accuracy.

	idx	label	Predicted `label`
64	64	negative	positive (p = 0.99)
70	70	negative	positive (p = 0.83)
78	78	positive	negative (p = 0.54)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`text_length(text)` < 37.500	Recall = 0.800	—	-12.08% than global

🔍✨Examples

For records in the dataset where `text_length(text)` < 37.500, the Recall is 12.08% lower than the global Recall.

	text	text_length(text)	label	Predicted `label`
1	unflinchingly bleak and desperate	34	negative	positive (p = 0.86)
112	hilariously inept and ridiculous .	35	positive	negative (p = 0.99)
113	this movie is maddening .	26	negative	positive (p = 0.96)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`text_length(text)` < 65.500 AND `text_length(text)` >= 56.500	Precision = 0.769	—	-10.89% than global

🔍✨Examples

For records in the dataset where `text_length(text)` < 65.500 AND `text_length(text)` >= 56.500, the Precision is 10.89% lower than the global Precision.

	text	text_length(text)	label	Predicted `label`
92	you wo n't like roger , but you will quickly recognize him .	61	negative	positive (p = 0.75)
183	the lower your expectations , the more you 'll enjoy it .	58	negative	positive (p = 0.97)
312	i 'll bet the video game is a lot more fun than the film .	59	negative	positive (p = 0.60)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`idx` >= 740.500 AND `idx` < 793.500	Recall = 0.828	—	-9.05% than global

🔍✨Examples

For records in the dataset where `idx` >= 740.500 AND `idx` < 793.500, the Recall is 9.05% lower than the global Recall.

	idx	label	Predicted `label`
741	741	positive	negative (p = 0.81)
742	742	positive	negative (p = 0.89)
749	749	positive	negative (p = 0.92)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_word_length(text)` >= 4.635 AND `avg_word_length(text)` < 4.743	Recall = 0.828	—	-9.05% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` >= 4.635 AND `avg_word_length(text)` < 4.743, the Recall is 9.05% lower than the global Recall.

	text	avg_word_length(text)	label	Predicted `label`
64	the script kicks in , and mr. hartley 's distended pace and foot-dragging rhythms follow .	4.6875	negative	positive (p = 0.99)
223	corny , schmaltzy and predictable , but still manages to be kind of heartwarming , nonetheless .	4.70588	positive	negative (p = 0.99)
248	a full world has been presented onscreen , not some series of carefully structured plot points building to a pat resolution .	4.72727	positive	negative (p = 0.54)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_whitespace(text)` < 0.177 AND `avg_whitespace(text)` >= 0.174	Recall = 0.828	—	-9.05% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` < 0.177 AND `avg_whitespace(text)` >= 0.174, the Recall is 9.05% lower than the global Recall.

	text	avg_whitespace(text)	label	Predicted `label`
64	the script kicks in , and mr. hartley 's distended pace and foot-dragging rhythms follow .	0.175824	negative	positive (p = 0.99)
223	corny , schmaltzy and predictable , but still manages to be kind of heartwarming , nonetheless .	0.175258	positive	negative (p = 0.99)
248	a full world has been presented onscreen , not some series of carefully structured plot points building to a pat resolution .	0.174603	positive	negative (p = 0.54)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`idx` >= 217.500 AND `idx` < 262.500	Recall = 0.840	—	-7.68% than global

🔍✨Examples

For records in the dataset where `idx` >= 217.500 AND `idx` < 262.500, the Recall is 7.68% lower than the global Recall.

	idx	label	Predicted `label`
218	218	negative	positive (p = 0.99)
223	223	positive	negative (p = 0.99)
230	230	positive	negative (p = 0.97)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`idx` >= 338.500 AND `idx` < 382.500	Precision = 0.800	—	-7.33% than global

🔍✨Examples

For records in the dataset where `idx` >= 338.500 AND `idx` < 382.500, the Precision is 7.33% lower than the global Precision.

	idx	label	Predicted `label`
339	339	positive	negative (p = 0.64)
346	346	negative	positive (p = 0.99)
356	356	negative	positive (p = 0.64)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

Checkout the Giskard Space and improve your model.
The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!