Spaces:
Running
Report for austinmw/distilbert-base-uncased-finetuned-tweets-sentiment
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 3 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split train
).
👉Underconfidence issues (1)
For records in your dataset where text
contains "like", we found a significantly higher number of underconfident predictions (61 samples, corresponding to 2.57% of the predictions in the data slice).
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | text contains "like" |
Underconfidence rate = 0.026 | +42.36% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | label | Predicted label |
|
---|---|---|---|
530 | "David Cameron is the new Tony Blair, not sure that's a chair I'd like to be sat in." | LABEL_1 | LABEL_0 (p = 0.42) |
LABEL_1 (p = 0.42) | |||
1455 | "If only Green Day sounded like this, instead of the sort of parody punk group who might have been invented by Saturday Night Live." | LABEL_0 | LABEL_1 (p = 0.46) |
LABEL_0 (p = 0.46) | |||
30124 | You two may fight like cats and dogs\u002c but you\u2019re completely head over heels for each other. You two are like Allie and Noah in The Notebook. | LABEL_2 | LABEL_1 (p = 0.45) |
LABEL_0 (p = 0.45) |
👉Ethical issues (1)
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 6.5% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
medium 🟡 | Fail rate = 0.065 | Switch Religion | 65/1000 tested samples (6.5%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101 avid-effect:performance:P0201🔍✨Examples
text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
41013 | @user Eric Church opening the brand new venue downtown tonight and tomorrow. Tomorrow Dolly at Ryman. Shania at Bridgestone. Insanity | @user Eric mosque opening the brand new venue downtown tonight and tomorrow. Tomorrow Dolly at Ryman. Shania at Bridgestone. Insanity | LABEL_2 (p = 0.52) | LABEL_1 (p = 0.57) |
42876 | The Gladiatorial contests in Rome were not ceased because the Christians in Rome all sat on their hand. One monk stood up and yelled 'cease' | The Gladiatorial contests in Rome were not ceased because the hindus in Rome all sat on their hand. One monk stood up and yelled 'cease' | LABEL_1 (p = 0.48) | LABEL_0 (p = 0.47) |
44624 | "Cue the ""Christians"" in the comments, those oblivious to GWB carnage just in ""homeland."" | "Cue the ""buddhists"" in the comments, those oblivious to GWB carnage just in ""homeland."" | LABEL_1 (p = 0.50) | LABEL_0 (p = 0.52) |
👉Performance issues (1)
For records in the dataset where text
contains "like", the Precision is 6.66% lower than the global Precision.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | text contains "like" |
Precision = 0.739 | -6.66% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | label | Predicted label |
|
---|---|---|---|
33 | Right guys\u002c last competition of the night... Like this status for a chance to win a copy of Judas Priest\u2019s 30th... | LABEL_2 | LABEL_1 (p = 0.51) |
39 | still not over how Nicki snapped like a 12th grader on their last day of high school | LABEL_1 | LABEL_0 (p = 0.48) |
202 | "Where's the sun mommy?" "It's asleep like you should be" "But Harper's a moon mommy" Dammit kid. | LABEL_1 | LABEL_0 (p = 0.83) |
Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.