Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split validation
).
👉Overconfidence issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Overconfidence | medium 🟡 | avg_digits(text) < 0.011 |
Overconfidence rate = 0.291 | — | +18.82% than global |
🔍✨Examples
For records in the dataset where `avg_digits(text)` < 0.011, we found a significantly higher number of overconfident wrong predictions (183 samples, corresponding to 29.093799682034977% of the wrong predictions in the data slice).text | avg_digits(text) | label | Predicted label |
|
---|---|---|---|---|
1900 | Monsanto wants to merge with Syngenta and change name to wash away the bad reputation (3rd most disliked company!): | 0.00869565 | neutral | negative (p = 0.95) |
neutral (p = 0.03) | ||||
1503 | @user this is absolutely ridiculous. Wasn't it just ""national ice cream day"" on Sunday? Who makes up these days? This isn't official. This is" | 0 | neutral | negative (p = 0.95) |
neutral (p = 0.04) | ||||
1203 | I may have just mentally assembled the most insane conspiracy web about the Dr. Luke / Kesha sitch. | 0 | neutral | negative (p = 0.94) |
neutral (p = 0.04) |
👉Ethical issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | major 🔴 | — | Fail rate = 0.106 | Switch Religion | 9/85 tested samples (10.59%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 10.59% of the cases. We expected the predictions not to be affected by this transformation.text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
171 | I really want to thank Angela Merkel for letting all those refugees in. She shows what humanity is about. May God bless her #refugeeswelcome | I really want to thank Angela Merkel for letting all those refugees in. She shows what humanity is about. May allah bless her #refugeeswelcome | positive (p = 0.60) | negative (p = 0.51) |
808 | @user may you be blessed by guns, god and hungry wet holes before Scott Walker builds his border wall and Donald Trump sends you home!" | @user may you be blessed by guns, allah and hungry wet holes before Scott Walker builds his border wall and Donald Trump sends you home!" | positive (p = 0.73) | negative (p = 0.55) |
921 | Floyd Mayweather ranked his top five boxers. Where he puts Muhammad Ali may shock you: | Floyd Mayweather ranked his top five boxers. Where he puts jesus christ Ali may shock you: | positive (p = 0.49) | negative (p = 0.50) |
👉Robustness issues (5)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.393 | Transform to uppercase | 393/1000 tested samples (39.3%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 39.3% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1074 | I'm so frustrated with Game of Thrones and I'm only on the 10th episode | I'M SO FRUSTRATED WITH GAME OF THRONES AND I'M ONLY ON THE 10TH EPISODE | negative (p = 0.88) | positive (p = 0.54) |
1816 | Guys... I'm seriously... #Stonehill right now... unranked and beating #3 #NewHaven in the 4th quarter... CBS College Sports... | GUYS... I'M SERIOUSLY... #STONEHILL RIGHT NOW... UNRANKED AND BEATING #3 #NEWHAVEN IN THE 4TH QUARTER... CBS COLLEGE SPORTS... | negative (p = 0.70) | positive (p = 0.43) |
1681 | """Why America May Go To Hell""- wish it wouldve been completed and i wish i could read the contents of it... by MLK" | """WHY AMERICA MAY GO TO HELL""- WISH IT WOULDVE BEEN COMPLETED AND I WISH I COULD READ THE CONTENTS OF IT... BY MLK" | negative (p = 0.42) | positive (p = 0.52) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.307 | Transform to title case | 307/1000 tested samples (30.7%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 30.7% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1816 | Guys... I'm seriously... #Stonehill right now... unranked and beating #3 #NewHaven in the 4th quarter... CBS College Sports... | Guys... I'M Seriously... #Stonehill Right Now... Unranked And Beating #3 #Newhaven In The 4Th Quarter... Cbs College Sports... | negative (p = 0.70) | positive (p = 0.43) |
1681 | """Why America May Go To Hell""- wish it wouldve been completed and i wish i could read the contents of it... by MLK" | """Why America May Go To Hell""- Wish It Wouldve Been Completed And I Wish I Could Read The Contents Of It... By Mlk" | negative (p = 0.42) | positive (p = 0.65) |
99 | omg then I sat on my floor in front of the TV and bawled over Shawn when he was performing on that one show | Omg Then I Sat On My Floor In Front Of The Tv And Bawled Over Shawn When He Was Performing On That One Show | negative (p = 0.51) | positive (p = 0.48) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.144 | Transform to lowercase | 144/1000 tested samples (14.4%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 14.4% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to lowercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1812 | Chuck Norris cut off his left nut and donated it to science. You may know it as Jupiter. | chuck norris cut off his left nut and donated it to science. you may know it as jupiter. | positive (p = 0.47) | negative (p = 0.44) |
226 | Good morning...back after a couple of days off for Labor Day weekend. So today is my Monday. I make no promises. Ready with @user | good morning...back after a couple of days off for labor day weekend. so today is my monday. i make no promises. ready with @user | neutral (p = 0.42) | positive (p = 0.52) |
1499 | UK: Chancellor Osborne try to sneak into 1st class train with standard ticket - Breaking News Buzz | uk: chancellor osborne try to sneak into 1st class train with standard ticket - breaking news buzz | positive (p = 0.43) | negative (p = 0.42) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.136 | Add typos | 136/1000 tested samples (13.6%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.6% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1228 | @user @user I think the Red Sox may be more popular around that region as well. Have many more fans nationally than Pats | @user @user I think the Red Sox may be mote opular around rhat revgion as well. Have many more fana nationally than Pats | positive (p = 0.61) | negative (p = 0.66) |
1250 | Doug Wead interviewed LIVE tonight\u002c Wed\u002c 10pm EDT we have link and other video Please RETWEET #ronpaul #ronpaul2012 | Doug Wead interviwed PIVE tonight\u002c Wed\u002c 10pm EDT wse have link and other video Please RETWEEF #ronpaul #ronpaul2012 | positive (p = 0.45) | negative (p = 0.40) |
357 | Bush and Clinton are running for their parties nomination. and Jurassic Park (World) is a hit at the box office. Jays are in 1st place | Busuh and Clinton are eunning for their parties nomination. and Jurassic Park (World) is a hit at the box offie. Jays are in 1st place | positive (p = 0.48) | negative (p = 0.47) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.092 | Punctuation Removal | 92/1000 tested samples (9.2%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.2% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1489 | Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight | Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight | negative (p = 0.41) | neutral (p = 0.42) |
1971 | Bowling tomorrow c; Don\u2019t want things to be awkard lol | Bowling tomorrow c Don\u2019t want things to be awkard lol | positive (p = 0.40) | negative (p = 0.40) |
1952 | @user @user Yellow journalism. But you know? This may be Harper's Waterloo | @user @user Yellow journalism But you know This may be Harper s Waterloo | negative (p = 0.42) | positive (p = 0.42) |
👉Performance issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text contains "friday" |
Precision = 0.432 | — | -7.05% than global |
🔍✨Examples
For records in the dataset where `text` contains "friday", the Precision is 7.05% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
27 | every time I hear alright by Kendrick I think it's j Cole's Black Friday | neutral | positive (p = 0.49) |
38 | ##$$## Black Friday Deals Olympus OM-D E-M5 Digital Camera - Black - with Olympus 12-50mm f/3.5-5.6 EZ Zoom Lens - B... | neutral | positive (p = 0.58) |
144 | When niggas in the bus are playing Kendrick and Cole's Black Friday out loud >>>>>> | neutral | negative (p = 0.45) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!