Report for citizenlab/twitter-xlm-roberta-base-sentiment-finetunned

#48
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 6 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english, split validation).

👉Robustness issues (4)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.154 Transform to uppercase 50/324 tested samples (15.43%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 15.43% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
2 Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond HOLD ON... SAM SMITH MAY DO THE THEME TO SPECTRE!? DOPE!!!!!! #007 #SPECTRE #JAMESBOND Positive (p = 0.98) Neutral (p = 0.99)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S Neutral (p = 0.81) Negative (p = 0.68)
6 @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." @USER @USER ISLAM IS AN ABRAHAMIC FAITH, ANDREW. IT MAY MAKE YOU FEEL A LITTLE UNEASY BUT IT'S THE SAME GOD YOU WORSHIP. SORRY." Neutral (p = 0.96) Negative (p = 0.85)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.090 Add typos 28/311 tested samples (9.0%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 9.0% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
5 Beautiful Bouquet with our Beautiful Bentley #bride #groom #wedding #wednesday #weddingcars #love #Repost... Beautifhl Bouwuet with our Beautiful Bentley #bride #groom #wedding #wednesday #weddingcars #love #Repost... Positive (p = 0.92) Neutral (p = 0.97)
7 Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East Harper's WorstO gfendse against Reufgees may be Climqte Record as rising temperatures add to chaos in the Mddle East Negative (p = 0.63) Neutral (p = 0.97)
15 "More like boring eagles""""""""@Tunnyking: C'mon bro, Go out and support the Super Eagles #RT @user I hate international breaks" "Mloee like boring ealges""""""""@Tunnyking: C'mon bro, Go out and support the Sjuper Eagles #RT @user OI bhate international breaks" Negative (p = 0.84) Neutral (p = 0.98)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.086 Transform to title case 28/324 tested samples (8.64%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 8.64% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S Neutral (p = 0.81) Negative (p = 0.61)
15 "More like boring eagles""""""""@Tunnyking: C'mon bro, Go out and support the Super Eagles #RT @user I hate international breaks" "More Like Boring Eagles""""""""@Tunnyking: C'Mon Bro, Go Out And Support The Super Eagles #Rt @User I Hate International Breaks" Negative (p = 0.84) Neutral (p = 0.59)
21 Celebrity Big Brother: Daniel's eviction stirs up bad feelings in the house: Daniel Baldwin may have left the ... Celebrity Big Brother: Daniel'S Eviction Stirs Up Bad Feelings In The House: Daniel Baldwin May Have Left The ... Negative (p = 0.80) Neutral (p = 0.73)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.077 Punctuation Removal 23/299 tested samples (7.69%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 7.69% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
2 Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond Hold on Sam Smith may do the theme to Spectre Dope #007 #SPECTRE #JamesBond Positive (p = 0.98) Neutral (p = 0.99)
7 Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East Harper s Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East Negative (p = 0.63) Neutral (p = 0.51)
26 "this adorable old couple in dunkin literally made my day, he's turning 89 tomorrow and talked to me about how he was drafted for the WWII" this adorable old couple in dunkin literally made my day he s turning 89 tomorrow and talked to me about how he was drafted for the WWII Positive (p = 0.58) Neutral (p = 0.69)
👉Performance issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "time" Precision = 0.350 -40.94% than global
🔍✨Examples For records in the dataset where `text` contains "time", the Precision is 40.94% lower than the global Precision.
text label Predicted label
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. Negative Neutral (p = 0.97)
35 "According to Janet Jackson's long time producer Terry Lewis, the album is due in October. STAY CONNECTED!... Positive Neutral (p = 0.98)
65 Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown Positive Neutral (p = 0.96)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text contains "tomorrow" Precision = 0.544 -8.22% than global
🔍✨Examples For records in the dataset where `text` contains "tomorrow", the Precision is 8.22% lower than the global Precision.
text label Predicted label
62 But it's a three day weekend and we see Ed Sheeran tomorrow (!!!!!) so things miiiight be looking up. Positive Neutral (p = 0.99)
68 When I wake up tomorrow I'll be in a different country. Whoa! I didn't run into a David Beckham at the airport. That's a bummer. Positive Negative (p = 0.96)
71 CINCH YOUR SADDLE is live on Amazon! Only 99 cents until tomorrow evening.Thank you gift! Positive Neutral (p = 0.87)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment