Report for mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis

#37
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 5 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset financial_phrasebank (subset sentences_75agree, split train).

👉Robustness issues (3)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.311 Transform to uppercase 311/1000 tested samples (31.1%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 31.1% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
202 Operating profit rose from EUR 1.94 mn to EUR 2.45 mn . OPERATING PROFIT ROSE FROM EUR 1.94 MN TO EUR 2.45 MN . positive (p = 1.00) neutral (p = 1.00)
3223 Making matters more difficult , the company said it has been grappling with higher oil and gas prices , which have pushed up the cost of energy , raw materials and transportation . MAKING MATTERS MORE DIFFICULT , THE COMPANY SAID IT HAS BEEN GRAPPLING WITH HIGHER OIL AND GAS PRICES , WHICH HAVE PUSHED UP THE COST OF ENERGY , RAW MATERIALS AND TRANSPORTATION . negative (p = 0.53) positive (p = 0.99)
1414 Systeemitiimi 's sales and project resources will also be strengthened , director Paul Skogberg said . SYSTEEMITIIMI 'S SALES AND PROJECT RESOURCES WILL ALSO BE STRENGTHENED , DIRECTOR PAUL SKOGBERG SAID . positive (p = 1.00) neutral (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.065 Transform to title case 65/1000 tested samples (6.5%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 6.5% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
2985 It is a disappointment to see the plan folded . It Is A Disappointment To See The Plan Folded . negative (p = 0.99) neutral (p = 1.00)
1413 Scanfil expects net sales in 2008 to remain at the 2007 level . Scanfil Expects Net Sales In 2008 To Remain At The 2007 Level . neutral (p = 1.00) positive (p = 1.00)
1467 ADP News - Feb 25 , 2009 - Finnish printed circuit board PCB maker Aspocomp Group Oyj HEL : ACG1V said today it swung to a net profit of EUR 300,000 USD 385,000 for 2008 versus a net loss of EUR 65.3 million Adp News - Feb 25 , 2009 - Finnish Printed Circuit Board Pcb Maker Aspocomp Group Oyj Hel : Acg1V Said Today It Swung To A Net Profit Of Eur 300,000 Usd 385,000 For 2008 Versus A Net Loss Of Eur 65.3 Million positive (p = 1.00) negative (p = 0.93)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.062 Add typos 62/1000 tested samples (6.2%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 6.2% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
3221 Operating result , excluding one-off items , totaled EUR 9.1 mn compared to EUR 10.6 mn in continuing operations , excluding one-off items in 2004 . Operating result , excluding one-off items , totaked EUR 9.1 mn cimpared to EUR 10.6 mn in continuing operatjons , excluding one-off items in 2004 . positive (p = 1.00) neutral (p = 1.00)
196 Net interest income was EUR 152.2 mn , up from EUR 101.0 mn in 2008 . Net interest uincome was EUR 52.2 mn , hp dom EUR 101.0 mn in 208 . positive (p = 1.00) neutral (p = 1.00)
356 To be number one means creating added value for stakeholders in everything we do . To be number one means cfeating acded vlue for stakeholders in everything we do . positive (p = 1.00) neutral (p = 1.00)
👉Performance issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text contains "finland" Balanced Accuracy = 0.847 -9.62% than global
🔍✨Examples For records in the dataset where `text` contains "finland", the Balanced Accuracy is 9.62% lower than the global Balanced Accuracy.
text label Predicted label
8 A purchase agreement for 7,200 tons of gasoline with delivery at the Hamina terminal , Finland , was signed with Neste Oil OYj at the average Platts index for this September plus eight US dollars per month . positive neutral (p = 0.77)
44 Seppala 's revenue increased by 0.2 % to EUR10 .1 m. In Finland , revenue went down by 2.4 % to EUR6 .8 m , while sales abroad rose by 6.2 % to EUR3 .3 m. Sales increased in all the Baltic countries as well as in Russia and Ukraine . positive negative (p = 0.91)
56 On the route between Helsinki in Finland and Tallinn in Estonia , cargo volumes increased by 36 % , while cargo volumes between Finland and Sweden fell by 9 % . neutral positive (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_word_length(text) >= 4.420 AND avg_word_length(text) < 4.565 Balanced Accuracy = 0.883 -5.78% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.420 AND `avg_word_length(text)` < 4.565, the Balanced Accuracy is 5.78% lower than the global Balanced Accuracy.
text avg_word_length(text) label Predicted label
8 A purchase agreement for 7,200 tons of gasoline with delivery at the Hamina terminal , Finland , was signed with Neste Oil OYj at the average Platts index for this September plus eight US dollars per month . 4.47368 positive neutral (p = 0.77)
328 Xerox and Stora Enso have teamed up to tailor the iGen3 to the short-run , on-demand packaging market . 4.47368 positive neutral (p = 1.00)
331 At the same time , the market for automated liquid handling devices is already larger than that for pipettes , according to Biohit . 4.54167 neutral positive (p = 0.96)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment