Report for cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english
, split validation
).
👉Ethical issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | medium 🟡 | — | Fail rate = 0.095 | Switch Religion | 2/21 tested samples (9.52%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 9.52% of the cases. We expected the predictions not to be affected by this transformation.text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
97 | Correction: Carson did not say Christians deserve more 1st Amendment protections than other religions. But what he did say was clear as mud. | Correction: Carson did not say jews deserve more 1st Amendment protections than other religions. But what he did say was clear as mud. | negative (p = 0.48) | neutral (p = 0.52) |
275 | @user Prayers for all of you today. May God carry each one of you during this sad time ""Footprints in the Sand"", RIP Frank Gifford" | @user Prayers for all of you today. May allah carry each one of you during this sad time ""Footprints in the Sand"", RIP Frank Gifford" | positive (p = 0.36) | negative (p = 0.42) |
👉Robustness issues (5)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.241 | Transform to uppercase | 78/324 tested samples (24.07%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 24.07% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
2 | Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond | HOLD ON... SAM SMITH MAY DO THE THEME TO SPECTRE!? DOPE!!!!!! #007 #SPECTRE #JAMESBOND | positive (p = 0.98) | neutral (p = 0.77) |
4 | Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S | GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S | positive (p = 0.96) | negative (p = 0.72) |
9 | Disappointed the Knicks vs Nets game got canceled tonight\u002c but I\u2019m even more hyped for Knicks vs Heat on Friday! | DISAPPOINTED THE KNICKS VS NETS GAME GOT CANCELED TONIGHT\U002C BUT I\U2019M EVEN MORE HYPED FOR KNICKS VS HEAT ON FRIDAY! | negative (p = 0.47) | positive (p = 0.97) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.185 | Transform to title case | 60/324 tested samples (18.52%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 18.52% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
0 | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. | @User @User I Think After Charlie Hebdo The French Did Not React As The Us Did After 9/11. But They May Do This Time Around. | negative (p = 0.50) | neutral (p = 0.73) |
1 | "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... | "Interview With Devon Alexander """"Speed Kills"""" (Video) On Tuesday Oct 16Th We Had The Privilege Of Catch Up With... | neutral (p = 0.67) | positive (p = 0.91) |
4 | Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S | Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S | positive (p = 0.96) | negative (p = 0.39) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.130 | Add typos | 40/308 tested samples (12.99%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 12.99% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S | Gonna watch Final Deztination 5 gobnight. I always leave the theater so afraid of everythinv. BNo huge escalators for sure :S | positive (p = 0.96) | negative (p = 0.93) |
9 | Disappointed the Knicks vs Nets game got canceled tonight\u002c but I\u2019m even more hyped for Knicks vs Heat on Friday! | Disapopoonted the Knicks vs ets game gof canceled tonight\u002c but I\u2019m even more ghyped for Knicks vs Heat on Friday! | negative (p = 0.47) | positive (p = 0.60) |
10 | "LONDON (AP) "" Prince George celebrates his second birthday on Wednesday and while he's just a toddler, he's al... | "LONFON (AP) "" Prince George velebrates his second birthday o Wedesday and while he's just a toddler, hwe's al... | neutral (p = 0.56) | positive (p = 0.67) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.094 | Punctuation Removal | 28/299 tested samples (9.36%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.36% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1 | "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... | Interview with Devon Alexander \Speed Kills\ (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with | neutral (p = 0.67) | positive (p = 0.69) |
2 | Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond | Hold on Sam Smith may do the theme to Spectre Dope #007 #SPECTRE #JamesBond | positive (p = 0.98) | neutral (p = 0.93) |
4 | Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S | Gonna watch Final Destination 5 tonight I always leave the theater so afraid of everything No huge escalators for sure S | positive (p = 0.96) | negative (p = 0.81) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.069 | Transform to lowercase | 22/318 tested samples (6.92%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 6.92% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to lowercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
36 | David Cameron's statement on camera on Thursday 03 September 2015: he will take in 'more' of the refugees: was he speaking TO TV Cameras? | david cameron's statement on camera on thursday 03 september 2015: he will take in 'more' of the refugees: was he speaking to tv cameras? | negative (p = 0.52) | neutral (p = 0.68) |
66 | "George Lincoln Rockwell was one of the 1st to recognize that Conservatives like @user Buckley, Goldwater & Reagan were #Cucks for Israel." | "george lincoln rockwell was one of the 1st to recognize that conservatives like @user buckley, goldwater & reagan were #cucks for israel." | positive (p = 0.87) | negative (p = 0.37) |
69 | Amazon Prime Day beats Black Friday says retailer Amazon Prime Day may have been an excuse for the retail... | amazon prime day beats black friday says retailer amazon prime day may have been an excuse for the retail... | negative (p = 0.64) | neutral (p = 0.56) |
👉Performance issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "1st" |
Precision = 0.650 | — | -11.88% than global |
🔍✨Examples
For records in the dataset where `text` contains "1st", the Precision is 11.88% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
16 | "The BAGRANGI new Pic,Of SALMAN khan That VERY FAMOUS IN PAK CENEMA'S at the 1st day of EID that pic,made 1.5 milion Rs Lolywood/Bolywood" | neutral | positive (p = 0.76) |
66 | "George Lincoln Rockwell was one of the 1st to recognize that Conservatives like @user Buckley, Goldwater & Reagan were #Cucks for Israel." | negative | positive (p = 0.87) |
79 | Digne and Falque caused Juventus real problems down their left in the 1st half. #ASRoma #Juventus | neutral | negative (p = 0.97) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "time" |
Precision = 0.650 | — | -11.88% than global |
🔍✨Examples
For records in the dataset where `text` contains "time", the Precision is 11.88% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
93 | "Sir John dined from Justin Bieber was closed, burst into the same time--""There is too awful whisper,--""I may accelerate that" | negative | neutral (p = 0.79) |
104 | I might reread the Harry Potter books for like the 7th time | positive | neutral (p = 0.77) |
109 | Serena and Venus Williams Face Off at US Open: For the 27th time, the sisters played against each other 14 yea... | neutral | positive (p = 0.61) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!