Report for JiaqiLee/imdb-finetuned-bert-base-uncased

#43
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 12 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Robustness issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.115 Add typos 92/803 tested samples (11.46%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.46% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
12 ... the film suffers from a lack of humor ( something needed to balance out the violence ) ... .... the cfilm sufcers froj a ladk of humor ( domething nweded to balance out the violence ) ... negative (p = 1.00) positive (p = 0.99)
13 we root for ( clara and paul ) , even like them , though perhaps it 's an emotion closer to pity . w root for ( clara and paul ) , even like them , though perhaps it 's an emlotion closer to pity . positive (p = 0.99) negative (p = 0.66)
46 a synthesis of cliches and absurdities that seems positively decadent in its cinematic flash and emptiness . a syhthesis og clichea ajd absurdities thag seems positivey decadet in its cinematic lash ande mptiness . positive (p = 0.95) negative (p = 0.99)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.059 Punctuation Removal 51/866 tested samples (5.89%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 5.89% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
4 it 's slow -- very , very slow . it s slow very very slow positive (p = 0.52) negative (p = 0.77)
33 if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . if the movie succeeds in instilling a wary sense of there but for the grace of god it is far too self conscious to draw you deeply into its world negative (p = 1.00) positive (p = 0.99)
66 if you 're hard up for raunchy college humor , this is your ticket right here . if you re hard up for raunchy college humor this is your ticket right here positive (p = 0.89) negative (p = 0.57)
👉Performance issues (10)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text_length(text) < 89.500 AND text_length(text) >= 80.500 Precision = 0.719 -15.79% than global
🔍✨Examples For records in the dataset where `text_length(text)` < 89.500 AND `text_length(text)` >= 80.500, the Precision is 15.79% lower than the global Precision.
text text_length(text) label Predicted label
115 sam mendes has become valedictorian at the school for soft landings and easy ways out . 88 negative positive (p = 0.95)
142 what better message than ` love thyself ' could young women of any size receive ? 82 positive negative (p = 1.00)
286 at its best , queen is campy fun like the vincent price horror classics of the '60s . 86 positive negative (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 idx >= 500.500 AND idx < 548.500 Accuracy = 0.812 -6.90% than global
🔍✨Examples For records in the dataset where `idx` >= 500.500 AND `idx` < 548.500, the Accuracy is 6.9% lower than the global Accuracy.
idx label Predicted label
501 501 positive negative (p = 1.00)
509 509 positive negative (p = 0.99)
513 513 negative positive (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 idx >= 444.500 AND idx < 500.500 Recall = 0.844 -6.81% than global
🔍✨Examples For records in the dataset where `idx` >= 444.500 AND `idx` < 500.500, the Recall is 6.81% lower than the global Recall.
idx label Predicted label
445 445 positive negative (p = 0.67)
446 446 negative positive (p = 1.00)
447 447 positive negative (p = 0.91)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_whitespace(text) < 0.154 Recall = 0.844 -6.81% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.154, the Recall is 6.81% lower than the global Recall.
text avg_whitespace(text) label Predicted label
1 unflinchingly bleak and desperate 0.117647 negative positive (p = 1.00)
68 good old-fashioned slash-and-hack is back ! 0.136364 positive negative (p = 0.60)
112 hilariously inept and ridiculous . 0.142857 positive negative (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_word_length(text) >= 5.511 Recall = 0.844 -6.81% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 5.511, the Recall is 6.81% lower than the global Recall.
text avg_word_length(text) label Predicted label
1 unflinchingly bleak and desperate 7.5 negative positive (p = 1.00)
68 good old-fashioned slash-and-hack is back ! 6.33333 positive negative (p = 0.60)
112 hilariously inept and ridiculous . 6 positive negative (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text_length(text) < 59.500 AND text_length(text) >= 50.500 Precision = 0.800 -6.27% than global
🔍✨Examples For records in the dataset where `text_length(text)` < 59.500 AND `text_length(text)` >= 50.500, the Precision is 6.27% lower than the global Precision.
text text_length(text) label Predicted label
139 it 's not the ultimate depression-era gangster movie . 55 negative positive (p = 0.98)
183 the lower your expectations , the more you 'll enjoy it . 58 negative positive (p = 0.99)
205 falls neatly into the category of good stupid fun . 52 positive negative (p = 0.92)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_whitespace(text) < 0.187 AND avg_whitespace(text) >= 0.183 Precision = 0.800 -6.27% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.187 AND `avg_whitespace(text)` >= 0.183, the Precision is 6.27% lower than the global Precision.
text avg_whitespace(text) label Predicted label
86 the film flat lines when it should peak and is more missed opportunity and trifle than dark , decadent truffle . 0.185841 negative positive (p = 0.93)
147 the talented and clever robert rodriguez perhaps put a little too much heart into his first film and did n't reserve enough for his second . 0.184397 negative positive (p = 0.97)
448 something akin to a japanese alice through the looking glass , except that it seems to take itself far more seriously . 0.183333 positive negative (p = 0.84)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_word_length(text) >= 4.354 AND avg_word_length(text) < 4.464 Precision = 0.800 -6.27% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.354 AND `avg_word_length(text)` < 4.464, the Precision is 6.27% lower than the global Precision.
text avg_word_length(text) label Predicted label
86 the film flat lines when it should peak and is more missed opportunity and trifle than dark , decadent truffle . 4.38095 negative positive (p = 0.93)
147 the talented and clever robert rodriguez perhaps put a little too much heart into his first film and did n't reserve enough for his second . 4.42308 negative positive (p = 0.97)
448 something akin to a japanese alice through the looking glass , except that it seems to take itself far more seriously . 4.45455 positive negative (p = 0.84)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_whitespace(text) < 0.195 AND avg_whitespace(text) >= 0.192 Recall = 0.850 -6.12% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.195 AND `avg_whitespace(text)` >= 0.192, the Recall is 6.12% lower than the global Recall.
text avg_whitespace(text) label Predicted label
113 this movie is maddening . 0.192308 negative positive (p = 1.00)
121 it seems to me the film is about the art of ripping people off without ever letting them consciously know you have done so 0.195122 negative positive (p = 0.98)
142 what better message than ` love thyself ' could young women of any size receive ? 0.195122 positive negative (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_word_length(text) >= 4.123 AND avg_word_length(text) < 4.209 Recall = 0.850 -6.12% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.123 AND `avg_word_length(text)` < 4.209, the Recall is 6.12% lower than the global Recall.
text avg_word_length(text) label Predicted label
113 this movie is maddening . 4.2 negative positive (p = 1.00)
121 it seems to me the film is about the art of ripping people off without ever letting them consciously know you have done so 4.125 negative positive (p = 0.98)
142 what better message than ` love thyself ' could young women of any size receive ? 4.125 positive negative (p = 1.00)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment