Report for ahmedrachid/FinancialBERT-Sentiment-Analysis
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 3 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset financial_phrasebank (subset sentences_50agree
, split train
).
👉Robustness issues (3)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.437 | Transform to title case | 437/1000 tested samples (43.7%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 43.7% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
996 | These moderate but significant changes resulted in a significant 24-32 % reduction in the estimated CVD risk . | These Moderate But Significant Changes Resulted In A Significant 24-32 % Reduction In The Estimated Cvd Risk . | positive (p = 1.00) | neutral (p = 1.00) |
4662 | Cash flow after investments amounted to EUR45m , down from EUR46m . | Cash Flow After Investments Amounted To Eur45M , Down From Eur46M . | negative (p = 1.00) | neutral (p = 1.00) |
300 | The stock rose for a second day on Wednesday bringing its two-day rise to GBX12 .0 or 2.0 % . | The Stock Rose For A Second Day On Wednesday Bringing Its Two-Day Rise To Gbx12 .0 Or 2.0 % . | positive (p = 1.00) | neutral (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.430 | Transform to uppercase | 430/1000 tested samples (43.0%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 43.0% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
996 | These moderate but significant changes resulted in a significant 24-32 % reduction in the estimated CVD risk . | THESE MODERATE BUT SIGNIFICANT CHANGES RESULTED IN A SIGNIFICANT 24-32 % REDUCTION IN THE ESTIMATED CVD RISK . | positive (p = 1.00) | neutral (p = 1.00) |
4662 | Cash flow after investments amounted to EUR45m , down from EUR46m . | CASH FLOW AFTER INVESTMENTS AMOUNTED TO EUR45M , DOWN FROM EUR46M . | negative (p = 1.00) | neutral (p = 1.00) |
300 | The stock rose for a second day on Wednesday bringing its two-day rise to GBX12 .0 or 2.0 % . | THE STOCK ROSE FOR A SECOND DAY ON WEDNESDAY BRINGING ITS TWO-DAY RISE TO GBX12 .0 OR 2.0 % . | positive (p = 1.00) | neutral (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.111 | Add typos | 111/1000 tested samples (11.1%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.1% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4072 | Scanfil issued a profit warning on 10 April 2006 . | Scanfil issued a profit aarning on 10 April 20006 . | negative (p = 1.00) | neutral (p = 1.00) |
1800 | Finnish insurance company Fennia and Kesko Group are ending their loyal customer cooperation . | Finnish insurajce company Fennia and Keso Group are ending thwir loyal customer fooperation . | negative (p = 0.99) | neutral (p = 1.00) |
1685 | UPM-Kymmene has generated four consecutive quarters of positive Free Cash Flow . | UPM-Kymmeme has generated four vonseutive quarters of positjive Free Cash Flow . | positive (p = 1.00) | neutral (p = 1.00) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!