Report for cardiffnlp/twitter-roberta-base-sentiment-latest
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 7 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english
, split validation
).
👉Overconfidence issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Overconfidence | major 🔴 | avg_word_length(text) >= 4.512 |
Overconfidence rate = 0.537 | — | +20.73% than global |
🔍✨Examples
For records in the dataset where `avg_word_length(text)` >= 4.512, we found a significantly higher number of overconfident wrong predictions (22 samples, corresponding to 53.65853658536586% of the wrong predictions in the data slice).text | avg_word_length(text) | label | Predicted label |
|
---|---|---|---|---|
123 | @user @user michael ball is incredible 10th anniversary with him and colm is sick | 4.85714 | negative | positive (p = 0.97) |
neutral (p = 0.02) | ||||
36 | David Cameron's statement on camera on Thursday 03 September 2015: he will take in 'more' of the refugees: was he speaking TO TV Cameras? | 4.75 | negative | neutral (p = 0.95) |
positive (p = 0.04) | ||||
14 | PM ready for reply on coal blocks: Congress: New Delhi\u002c Aug 22 (IANS) With the Bharatiya Janata Party (BJP)... | 5.10526 | positive | neutral (p = 0.95) |
positive (p = 0.03) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Overconfidence | medium 🟡 | avg_whitespace(text) < 0.181 |
Overconfidence rate = 0.525 | — | +18.13% than global |
🔍✨Examples
For records in the dataset where `avg_whitespace(text)` < 0.181, we found a significantly higher number of overconfident wrong predictions (21 samples, corresponding to 52.5% of the wrong predictions in the data slice).text | avg_whitespace(text) | label | Predicted label |
|
---|---|---|---|---|
123 | @user @user michael ball is incredible 10th anniversary with him and colm is sick | 0.170732 | negative | positive (p = 0.97) |
neutral (p = 0.02) | ||||
36 | David Cameron's statement on camera on Thursday 03 September 2015: he will take in 'more' of the refugees: was he speaking TO TV Cameras? | 0.179856 | negative | neutral (p = 0.95) |
positive (p = 0.04) | ||||
14 | PM ready for reply on coal blocks: Congress: New Delhi\u002c Aug 22 (IANS) With the Bharatiya Janata Party (BJP)... | 0.163793 | positive | neutral (p = 0.95) |
positive (p = 0.03) |
👉Robustness issues (5)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.182 | Transform to uppercase | 59/324 tested samples (18.21%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 18.21% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
0 | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. | @USER @USER I THINK AFTER CHARLIE HEBDO THE FRENCH DID NOT REACT AS THE US DID AFTER 9/11. BUT THEY MAY DO THIS TIME AROUND. | negative (p = 0.50) | neutral (p = 0.67) |
8 | @user call Hafiz saeed sir he may help u out. Maybe Pope can b handy . Try it. | @USER CALL HAFIZ SAEED SIR HE MAY HELP U OUT. MAYBE POPE CAN B HANDY . TRY IT. | neutral (p = 0.67) | positive (p = 0.61) |
10 | "LONDON (AP) "" Prince George celebrates his second birthday on Wednesday and while he's just a toddler, he's al... | "LONDON (AP) "" PRINCE GEORGE CELEBRATES HIS SECOND BIRTHDAY ON WEDNESDAY AND WHILE HE'S JUST A TODDLER, HE'S AL... | positive (p = 0.65) | neutral (p = 0.56) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.110 | Add typos | 34/308 tested samples (11.04%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.04% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
22 | Hey David Bowie Do u want to get iPh0ne 6 for FREE? U better check my bi0. Thx | Hey David Bowie Do u want to get iPh0ne 6 for FREE? U better chrck my bi0. Thx | neutral (p = 0.56) | positive (p = 0.51) |
27 | @user @user Yellow journalism. But you know? This may be Harper's Waterloo | @user @user Yellow journlism. But hyou know? This may be Harper's Wwterloo | negative (p = 0.59) | neutral (p = 0.54) |
48 | I'm gonna watch Sharknado 3 cause I have no tv shows to watch on a Wednesday not cause I enjoy it. | I'm gonna watch Sharknado 3 cause I have no tv shows to watch on a Wednesday nkot cause I enjoy it. | neutral (p = 0.41) | positive (p = 0.90) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.100 | Punctuation Removal | 30/299 tested samples (10.03%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 10.03% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
0 | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9 11 But they may do this time around | negative (p = 0.50) | neutral (p = 0.50) |
2 | Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond | Hold on Sam Smith may do the theme to Spectre Dope #007 #SPECTRE #JamesBond | positive (p = 0.83) | neutral (p = 0.90) |
6 | @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." | @user @user Islam is an Abrahamic faith Andrew It may make you feel a little uneasy but it s the same God you worship Sorry | neutral (p = 0.59) | negative (p = 0.50) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.096 | Transform to title case | 31/324 tested samples (9.57%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 9.57% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
0 | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. | @User @User I Think After Charlie Hebdo The French Did Not React As The Us Did After 9/11. But They May Do This Time Around. | negative (p = 0.50) | neutral (p = 0.56) |
9 | Disappointed the Knicks vs Nets game got canceled tonight\u002c but I\u2019m even more hyped for Knicks vs Heat on Friday! | Disappointed The Knicks Vs Nets Game Got Canceled Tonight\U002C But I\U2019M Even More Hyped For Knicks Vs Heat On Friday! | positive (p = 0.56) | neutral (p = 0.39) |
51 | @user tom Brady did not deflate balls, but was suspended for 4 games bc he may or may not have known it was being done" | @User Tom Brady Did Not Deflate Balls, But Was Suspended For 4 Games Bc He May Or May Not Have Known It Was Being Done" | negative (p = 0.51) | neutral (p = 0.69) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.063 | Transform to lowercase | 20/318 tested samples (6.29%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 6.29% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to lowercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
0 | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. | @user @user i think after charlie hebdo the french did not react as the us did after 9/11. but they may do this time around. | negative (p = 0.50) | neutral (p = 0.71) |
3 | kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday | kingpin saudi arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on monday | neutral (p = 0.50) | negative (p = 0.54) |
12 | It is reality that ISIS are on the march in Turkey and Erdogan can't wait to receive them with open arms | it is reality that isis are on the march in turkey and erdogan can't wait to receive them with open arms | negative (p = 0.61) | positive (p = 0.77) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!