Report for citizenlab/twitter-xlm-roberta-base-sentiment-finetunned

#47
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 7 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).

👉Robustness issues (4)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.117 Transform to uppercase 117/1000 tested samples (11.7%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 11.7% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
886 "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" "FAKE PUNT ON 4TH AND 11? WOW, JAMES FRANKLIN CAN MAKE SOME ODD DECISIONS. #PENNSTATE #MICHIGAN #PSUVSMICH" Negative (p = 0.65) Neutral (p = 0.97)
1554 I've been thinking about it... Does anyone else find it disturbing how Kane may face rape charges and GM's are calling on his availability? I'VE BEEN THINKING ABOUT IT... DOES ANYONE ELSE FIND IT DISTURBING HOW KANE MAY FACE RAPE CHARGES AND GM'S ARE CALLING ON HIS AVAILABILITY? Negative (p = 0.55) Neutral (p = 0.98)
219 Nebraska doesn't land Gesell...a Top 100 guy in your state and you don't get him. C'mon #Nebrasketball NEBRASKA DOESN'T LAND GESELL...A TOP 100 GUY IN YOUR STATE AND YOU DON'T GET HIM. C'MON #NEBRASKETBALL Negative (p = 0.75) Neutral (p = 0.65)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.105 Add typos 105/1000 tested samples (10.5%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 10.5% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
1808 Where will Arsenal finish this season? 4th. 36% of voters agree with me. Where will Arsenal finish this season? 4th. 3%6% of voters aree with me. Positive (p = 0.59) Neutral (p = 0.95)
1612 "And the UFC is fucked up, Why you may ask? Because they are all getting hoed and controled by Dana White" "And the UFC is ucked up, Why yu may ask? Because they are wall getting hoed and controled by Dana White" Negative (p = 0.86) Neutral (p = 0.76)
886 "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" "Fake pnt on 4th and 11? Wow, Jzames Franklim can make somre odd dveisions. #PennState Michigan #PSUvsMICH" Negative (p = 0.65) Neutral (p = 0.84)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.081 Punctuation Removal 81/1000 tested samples (8.1%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.1% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
1489 Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight Positive (p = 0.93) Neutral (p = 0.90)
1178 Since September 26 is Batman Day I'm having a Batman month. We start off with the greatest Batman Story ever #Batman Since September 26 is Batman Day I m having a Batman month We start off with the greatest Batman Story ever #Batman Positive (p = 0.66) Neutral (p = 0.56)
181 "Kapan sih lo ngebuktiin,jan ngomong doang Susah Susah.usaha Aja blm udh nyerah,inget.if you never try you'll never know.cowok kok gentle bgt" Kapan sih lo ngebuktiin jan ngomong doang Susah Susah usaha Aja blm udh nyerah inget if you never try you ll never know cowok kok gentle bgt Negative (p = 0.76) Neutral (p = 0.69)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.076 Transform to title case 76/1000 tested samples (7.6%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 7.6% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
886 "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" "Fake Punt On 4Th And 11? Wow, James Franklin Can Make Some Odd Decisions. #Pennstate #Michigan #Psuvsmich" Negative (p = 0.65) Neutral (p = 0.79)
1554 I've been thinking about it... Does anyone else find it disturbing how Kane may face rape charges and GM's are calling on his availability? I'Ve Been Thinking About It... Does Anyone Else Find It Disturbing How Kane May Face Rape Charges And Gm'S Are Calling On His Availability? Negative (p = 0.55) Neutral (p = 0.95)
219 Nebraska doesn't land Gesell...a Top 100 guy in your state and you don't get him. C'mon #Nebrasketball Nebraska Doesn'T Land Gesell...A Top 100 Guy In Your State And You Don'T Get Him. C'Mon #Nebrasketball Negative (p = 0.75) Neutral (p = 0.89)
👉Performance issues (3)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "day" Precision = 0.555 -10.74% than global
🔍✨Examples For records in the dataset where `text` contains "day", the Precision is 10.74% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." Positive Neutral (p = 0.97)
58 "Tomorrow is National Ice Cream Day. Just in case you can't make it to the dining hall to satisfy your craving, here are some stores......" Positive Neutral (p = 0.99)
98 @user Dear Taimouraga, Thank you for contacting. Apologies for the late reply. Yes the Centers were open at the 4th day of Eid." Positive Neutral (p = 0.97)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "like" Precision = 0.558 -10.29% than global
🔍✨Examples For records in the dataset where `text` contains "like", the Precision is 10.29% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." Positive Neutral (p = 0.97)
17 Why do y'all want Nicki to be pregnant so bad like maybe around the 7th album but she's literally still in her prime. Neutral Negative (p = 0.76)
30 Nicki did that for white media Idgaf . Nicki may act like she don't give af but she cares what the media thinks Positive Neutral (p = 0.96)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text contains "night" Precision = 0.590 -5.11% than global
🔍✨Examples For records in the dataset where `text` contains "night", the Precision is 5.11% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." Positive Neutral (p = 0.97)
69 @user Front row shot of David Wright on Wednesday night in St.Lucie. Keep up the excellent work, sir! Neutral Positive (p = 0.97)
72 "We have four Premium Seats for the Zac Brown Band, for this Friday Night 8/7/15 at Fenway Park. These are... Positive Neutral (p = 0.99)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment