Report for citizenlab/twitter-xlm-roberta-base-sentiment-finetunned
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 7 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split validation
).
👉Robustness issues (4)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.117 | Transform to uppercase | 117/1000 tested samples (11.7%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 11.7% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
886 | "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" | "FAKE PUNT ON 4TH AND 11? WOW, JAMES FRANKLIN CAN MAKE SOME ODD DECISIONS. #PENNSTATE #MICHIGAN #PSUVSMICH" | Negative (p = 0.65) | Neutral (p = 0.97) |
1554 | I've been thinking about it... Does anyone else find it disturbing how Kane may face rape charges and GM's are calling on his availability? | I'VE BEEN THINKING ABOUT IT... DOES ANYONE ELSE FIND IT DISTURBING HOW KANE MAY FACE RAPE CHARGES AND GM'S ARE CALLING ON HIS AVAILABILITY? | Negative (p = 0.55) | Neutral (p = 0.98) |
219 | Nebraska doesn't land Gesell...a Top 100 guy in your state and you don't get him. C'mon #Nebrasketball | NEBRASKA DOESN'T LAND GESELL...A TOP 100 GUY IN YOUR STATE AND YOU DON'T GET HIM. C'MON #NEBRASKETBALL | Negative (p = 0.75) | Neutral (p = 0.65) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.105 | Add typos | 105/1000 tested samples (10.5%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 10.5% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1808 | Where will Arsenal finish this season? 4th. 36% of voters agree with me. | Where will Arsenal finish this season? 4th. 3%6% of voters aree with me. | Positive (p = 0.59) | Neutral (p = 0.95) |
1612 | "And the UFC is fucked up, Why you may ask? Because they are all getting hoed and controled by Dana White" | "And the UFC is ucked up, Why yu may ask? Because they are wall getting hoed and controled by Dana White" | Negative (p = 0.86) | Neutral (p = 0.76) |
886 | "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" | "Fake pnt on 4th and 11? Wow, Jzames Franklim can make somre odd dveisions. #PennState Michigan #PSUvsMICH" | Negative (p = 0.65) | Neutral (p = 0.84) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.081 | Punctuation Removal | 81/1000 tested samples (8.1%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.1% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1489 | Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight | Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight | Positive (p = 0.93) | Neutral (p = 0.90) |
1178 | Since September 26 is Batman Day I'm having a Batman month. We start off with the greatest Batman Story ever #Batman | Since September 26 is Batman Day I m having a Batman month We start off with the greatest Batman Story ever #Batman | Positive (p = 0.66) | Neutral (p = 0.56) |
181 | "Kapan sih lo ngebuktiin,jan ngomong doang Susah Susah.usaha Aja blm udh nyerah,inget.if you never try you'll never know.cowok kok gentle bgt" | Kapan sih lo ngebuktiin jan ngomong doang Susah Susah usaha Aja blm udh nyerah inget if you never try you ll never know cowok kok gentle bgt | Negative (p = 0.76) | Neutral (p = 0.69) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.076 | Transform to title case | 76/1000 tested samples (7.6%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 7.6% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
886 | "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" | "Fake Punt On 4Th And 11? Wow, James Franklin Can Make Some Odd Decisions. #Pennstate #Michigan #Psuvsmich" | Negative (p = 0.65) | Neutral (p = 0.79) |
1554 | I've been thinking about it... Does anyone else find it disturbing how Kane may face rape charges and GM's are calling on his availability? | I'Ve Been Thinking About It... Does Anyone Else Find It Disturbing How Kane May Face Rape Charges And Gm'S Are Calling On His Availability? | Negative (p = 0.55) | Neutral (p = 0.95) |
219 | Nebraska doesn't land Gesell...a Top 100 guy in your state and you don't get him. C'mon #Nebrasketball | Nebraska Doesn'T Land Gesell...A Top 100 Guy In Your State And You Don'T Get Him. C'Mon #Nebrasketball | Negative (p = 0.75) | Neutral (p = 0.89) |
👉Performance issues (3)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "day" |
Precision = 0.555 | — | -10.74% than global |
🔍✨Examples
For records in the dataset where `text` contains "day", the Precision is 10.74% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | Positive | Neutral (p = 0.97) |
58 | "Tomorrow is National Ice Cream Day. Just in case you can't make it to the dining hall to satisfy your craving, here are some stores......" | Positive | Neutral (p = 0.99) |
98 | @user Dear Taimouraga, Thank you for contacting. Apologies for the late reply. Yes the Centers were open at the 4th day of Eid." | Positive | Neutral (p = 0.97) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "like" |
Precision = 0.558 | — | -10.29% than global |
🔍✨Examples
For records in the dataset where `text` contains "like", the Precision is 10.29% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | Positive | Neutral (p = 0.97) |
17 | Why do y'all want Nicki to be pregnant so bad like maybe around the 7th album but she's literally still in her prime. | Neutral | Negative (p = 0.76) |
30 | Nicki did that for white media Idgaf . Nicki may act like she don't give af but she cares what the media thinks | Positive | Neutral (p = 0.96) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text contains "night" |
Precision = 0.590 | — | -5.11% than global |
🔍✨Examples
For records in the dataset where `text` contains "night", the Precision is 5.11% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | Positive | Neutral (p = 0.97) |
69 | @user Front row shot of David Wright on Wednesday night in St.Lucie. Keep up the excellent work, sir! | Neutral | Positive (p = 0.97) |
72 | "We have four Premium Seats for the Zac Brown Band, for this Friday Night 8/7/15 at Fenway Park. These are... | Positive | Neutral (p = 0.99) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!