Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student

#49
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 8 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).

👉Overconfidence issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Overconfidence medium 🟡 avg_digits(text) < 0.011 Overconfidence rate = 0.291 +18.82% than global
🔍✨Examples For records in the dataset where `avg_digits(text)` < 0.011, we found a significantly higher number of overconfident wrong predictions (183 samples, corresponding to 29.093799682034977% of the wrong predictions in the data slice).
text avg_digits(text) label Predicted label
1900 Monsanto wants to merge with Syngenta and change name to wash away the bad reputation (3rd most disliked company!): 0.00869565 neutral negative (p = 0.95)
neutral (p = 0.03)
1503 @user this is absolutely ridiculous. Wasn't it just ""national ice cream day"" on Sunday? Who makes up these days? This isn't official. This is" 0 neutral negative (p = 0.95)
neutral (p = 0.04)
1203 I may have just mentally assembled the most insane conspiracy web about the Dr. Luke / Kesha sitch. 0 neutral negative (p = 0.94)
neutral (p = 0.04)
👉Ethical issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical major 🔴 Fail rate = 0.106 Switch Religion 9/85 tested samples (10.59%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 10.59% of the cases. We expected the predictions not to be affected by this transformation.
text Switch Religion(text) Original prediction Prediction after perturbation
171 I really want to thank Angela Merkel for letting all those refugees in. She shows what humanity is about. May God bless her #refugeeswelcome I really want to thank Angela Merkel for letting all those refugees in. She shows what humanity is about. May allah bless her #refugeeswelcome positive (p = 0.60) negative (p = 0.51)
808 @user may you be blessed by guns, god and hungry wet holes before Scott Walker builds his border wall and Donald Trump sends you home!" @user may you be blessed by guns, allah and hungry wet holes before Scott Walker builds his border wall and Donald Trump sends you home!" positive (p = 0.73) negative (p = 0.55)
921 Floyd Mayweather ranked his top five boxers. Where he puts Muhammad Ali may shock you: Floyd Mayweather ranked his top five boxers. Where he puts jesus christ Ali may shock you: positive (p = 0.49) negative (p = 0.50)
👉Robustness issues (5)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.393 Transform to uppercase 393/1000 tested samples (39.3%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 39.3% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
1074 I'm so frustrated with Game of Thrones and I'm only on the 10th episode I'M SO FRUSTRATED WITH GAME OF THRONES AND I'M ONLY ON THE 10TH EPISODE negative (p = 0.88) positive (p = 0.54)
1816 Guys... I'm seriously... #Stonehill right now... unranked and beating #3 #NewHaven in the 4th quarter... CBS College Sports... GUYS... I'M SERIOUSLY... #STONEHILL RIGHT NOW... UNRANKED AND BEATING #3 #NEWHAVEN IN THE 4TH QUARTER... CBS COLLEGE SPORTS... negative (p = 0.70) positive (p = 0.43)
1681 """Why America May Go To Hell""- wish it wouldve been completed and i wish i could read the contents of it... by MLK" """WHY AMERICA MAY GO TO HELL""- WISH IT WOULDVE BEEN COMPLETED AND I WISH I COULD READ THE CONTENTS OF IT... BY MLK" negative (p = 0.42) positive (p = 0.52)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.307 Transform to title case 307/1000 tested samples (30.7%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 30.7% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
1816 Guys... I'm seriously... #Stonehill right now... unranked and beating #3 #NewHaven in the 4th quarter... CBS College Sports... Guys... I'M Seriously... #Stonehill Right Now... Unranked And Beating #3 #Newhaven In The 4Th Quarter... Cbs College Sports... negative (p = 0.70) positive (p = 0.43)
1681 """Why America May Go To Hell""- wish it wouldve been completed and i wish i could read the contents of it... by MLK" """Why America May Go To Hell""- Wish It Wouldve Been Completed And I Wish I Could Read The Contents Of It... By Mlk" negative (p = 0.42) positive (p = 0.65)
99 omg then I sat on my floor in front of the TV and bawled over Shawn when he was performing on that one show Omg Then I Sat On My Floor In Front Of The Tv And Bawled Over Shawn When He Was Performing On That One Show negative (p = 0.51) positive (p = 0.48)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.144 Transform to lowercase 144/1000 tested samples (14.4%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 14.4% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to lowercase(text) Original prediction Prediction after perturbation
1812 Chuck Norris cut off his left nut and donated it to science. You may know it as Jupiter. chuck norris cut off his left nut and donated it to science. you may know it as jupiter. positive (p = 0.47) negative (p = 0.44)
226 Good morning...back after a couple of days off for Labor Day weekend. So today is my Monday. I make no promises. Ready with @user good morning...back after a couple of days off for labor day weekend. so today is my monday. i make no promises. ready with @user neutral (p = 0.42) positive (p = 0.52)
1499 UK: Chancellor Osborne try to sneak into 1st class train with standard ticket - Breaking News Buzz uk: chancellor osborne try to sneak into 1st class train with standard ticket - breaking news buzz positive (p = 0.43) negative (p = 0.42)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.136 Add typos 136/1000 tested samples (13.6%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.6% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
1228 @user @user I think the Red Sox may be more popular around that region as well. Have many more fans nationally than Pats @user @user I think the Red Sox may be mote opular around rhat revgion as well. Have many more fana nationally than Pats positive (p = 0.61) negative (p = 0.66)
1250 Doug Wead interviewed LIVE tonight\u002c Wed\u002c 10pm EDT we have link and other video Please RETWEET #ronpaul #ronpaul2012 Doug Wead interviwed PIVE tonight\u002c Wed\u002c 10pm EDT wse have link and other video Please RETWEEF #ronpaul #ronpaul2012 positive (p = 0.45) negative (p = 0.40)
357 Bush and Clinton are running for their parties nomination. and Jurassic Park (World) is a hit at the box office. Jays are in 1st place Busuh and Clinton are eunning for their parties nomination. and Jurassic Park (World) is a hit at the box offie. Jays are in 1st place positive (p = 0.48) negative (p = 0.47)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.092 Punctuation Removal 92/1000 tested samples (9.2%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.2% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
1489 Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight negative (p = 0.41) neutral (p = 0.42)
1971 Bowling tomorrow c; Don\u2019t want things to be awkard lol Bowling tomorrow c Don\u2019t want things to be awkard lol positive (p = 0.40) negative (p = 0.40)
1952 @user @user Yellow journalism. But you know? This may be Harper's Waterloo @user @user Yellow journalism But you know This may be Harper s Waterloo negative (p = 0.42) positive (p = 0.42)
👉Performance issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text contains "friday" Precision = 0.432 -7.05% than global
🔍✨Examples For records in the dataset where `text` contains "friday", the Precision is 7.05% lower than the global Precision.
text label Predicted label
27 every time I hear alright by Kendrick I think it's j Cole's Black Friday neutral positive (p = 0.49)
38 ##$$## Black Friday Deals Olympus OM-D E-M5 Digital Camera - Black - with Olympus 12-50mm f/3.5-5.6 EZ Zoom Lens - B... neutral positive (p = 0.58)
144 When niggas in the bus are playing Kendrick and Cole's Black Friday out loud >>>>>> neutral negative (p = 0.45)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment