Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student

#190
by giskard-bot - opened
Giskard org

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 7 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english, split test).

👉Performance issues (1)

For records in the dataset where text contains "trump", the Precision is 9.08% lower than the global Precision.

Level Data slice Metric Deviation
medium 🟡 text contains "trump" Precision = 0.507 -9.08% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text label Predicted label
63 Donald Trump does not have a clue about global warming. Maybe the Rockefeller's can clue them in about fossil fuels. negative neutral (p = 0.59)
109 @user where did you get the fact that there is infighting in the Trump transition team over SofS? @user neutral negative (p = 0.67)
127 Quote of the year:"Hello" - Melania Trump neutral positive (p = 0.57)
👉Ethical issues (1)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 15.62% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.156 Switch Religion 5/32 tested samples (15.62%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
198 Not sure I can take anymore. Brexit, Trump and now no more Casey and Jessica has left Eric. God is life worth living ? Tesla model S,o YES. Not sure I can take anymore. Brexit, Trump and now no more Casey and Jessica has left Eric. allah is life worth living ? Tesla model S,o YES. positive (p = 0.44) negative (p = 0.39)
314 If @user made an appearance as Adam again I'd have to call him a God because he has so much material on #ThisIsUs #yr #Dreams If @user made an appearance as Adam again I'd have to call him a allah because he has so much material on #ThisIsUs #yr #Dreams positive (p = 0.68) negative (p = 0.53)
368 whew god damn lea michele is so sexy #LeaMichele #ScreamQueens #Hester #Booty whew allah damn lea michele is so sexy #LeaMichele #ScreamQueens #Hester #Booty positive (p = 0.52) negative (p = 0.44)
👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 42.61% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.426 Transform to uppercase 369/866 tested samples (42.61%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
0 Trying to have a conversation with my dad about vegetarianism is the most pointless infuriating thing ever #caveman TRYING TO HAVE A CONVERSATION WITH MY DAD ABOUT VEGETARIANISM IS THE MOST POINTLESS INFURIATING THING EVER #CAVEMAN negative (p = 0.75) positive (p = 0.54)
1 #latestnews 4 #newmexico #politics + #nativeamerican + #Israel + #Palestine - Protesting Rise Of Alt-Right At... #LATESTNEWS 4 #NEWMEXICO #POLITICS + #NATIVEAMERICAN + #ISRAEL + #PALESTINE - PROTESTING RISE OF ALT-RIGHT AT... negative (p = 0.61) positive (p = 0.55)
3 @user @user @user Looks like Flynn isn't too pleased with me, he blocked me. You blocked by Flynn too @user @USER @USER @USER LOOKS LIKE FLYNN ISN'T TOO PLEASED WITH ME, HE BLOCKED ME. YOU BLOCKED BY FLYNN TOO @USER negative (p = 0.53) positive (p = 0.53)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 28.19% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.282 Transform to title case 243/862 tested samples (28.19%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
0 Trying to have a conversation with my dad about vegetarianism is the most pointless infuriating thing ever #caveman Trying To Have A Conversation With My Dad About Vegetarianism Is The Most Pointless Infuriating Thing Ever #Caveman negative (p = 0.75) positive (p = 0.49)
3 @user @user @user Looks like Flynn isn't too pleased with me, he blocked me. You blocked by Flynn too @user @User @User @User Looks Like Flynn Isn'T Too Pleased With Me, He Blocked Me. You Blocked By Flynn Too @User negative (p = 0.53) positive (p = 0.55)
5 i'm not even catholic, but pope francis is my dude. like i just need him to hug me and tell me everything is okay. I'M Not Even Catholic, But Pope Francis Is My Dude. Like I Just Need Him To Hug Me And Tell Me Everything Is Okay. neutral (p = 0.43) positive (p = 0.54)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 12.73% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.127 Transform to lowercase 105/825 tested samples (12.73%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to lowercase(text) Original prediction Prediction after perturbation
17 The Reputation Doctor weighs in on Tony Romo #NFL @user joins @user on #TheMorningRush LISTEN: the reputation doctor weighs in on tony romo #nfl @user joins @user on #themorningrush listen: positive (p = 0.52) negative (p = 0.53)
46 I'm crying over Richard and Leonard Cohen 😭😭😭 #GilmoreGirlsRevival i'm crying over richard and leonard cohen 😭😭😭 #gilmoregirlsrevival positive (p = 0.42) negative (p = 0.47)
50 If you wanna have some seasonal fun & #teachecon #Hatchimals are today's Cabbage Patch Kids & Tickle Me Elmo Christ… if you wanna have some seasonal fun & #teachecon #hatchimals are today's cabbage patch kids & tickle me elmo christ… positive (p = 0.61) negative (p = 0.59)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 12.22% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.122 Add typos 100/818 tested samples (12.22%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
2 @user You are a stand up guy and a Gentleman Vice President Pence @user You are stand up guy anr a Genteman Vice Pesident Pence positive (p = 0.53) negative (p = 0.43)
11 I will go so far to say s1 of westworld isn't just good, it's brilliant. A story within a story within a story about storytelling I will go so far to say 1 of westworld isn't just good, it's brillisnt. A story within a stor wthin a story about storytelling positive (p = 0.66) negative (p = 0.81)
27 Ben Carson for Housing & Urban Development?? 😐 I just can't 😒 Ben Carson for Housig & Urban Development?? 😐 Ij ust can't 😒 neutral (p = 0.39) negative (p = 0.41)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 7.06% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.071 Punctuation Removal 53/751 tested samples (7.06%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
11 I will go so far to say s1 of westworld isn't just good, it's brilliant. A story within a story within a story about storytelling I will go so far to say s1 of westworld isn t just good it s brilliant A story within a story within a story about storytelling positive (p = 0.66) negative (p = 0.46)
40 @user She will be hearing my voice on her hesitation to back HRC. I am a MA voter. @user @user @user @user She will be hearing my voice on her hesitation to back HRC I am a MA voter @user @user @user negative (p = 0.40) positive (p = 0.41)
42 @user Coward... well... why doesn't Poroshenko or Avakov or Saakasjvili travel to Crimea? @user Coward well why doesn t Poroshenko or Avakov or Saakasjvili travel to Crimea negative (p = 0.38) positive (p = 0.42)

Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment