weixuan-giskard/scan-report-temp · Report for JiaqiLee/imdb-finetuned-bert-base-uncased

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 12 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Robustness issues (2)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Robustness	major 🔴	—	Fail rate = 0.115	Add typos	92/803 tested samples (11.46%) changed prediction after perturbation

🔍✨Examples

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.46% of the cases. We expected the predictions not to be affected by this transformation.

	text	Add typos(text)	Original prediction	Prediction after perturbation
12	... the film suffers from a lack of humor ( something needed to balance out the violence ) ...	.... the cfilm sufcers froj a ladk of humor ( domething nweded to balance out the violence ) ...	negative (p = 1.00)	positive (p = 0.99)
13	we root for ( clara and paul ) , even like them , though perhaps it 's an emotion closer to pity .	w root for ( clara and paul ) , even like them , though perhaps it 's an emlotion closer to pity .	positive (p = 0.99)	negative (p = 0.66)
46	a synthesis of cliches and absurdities that seems positively decadent in its cinematic flash and emptiness .	a syhthesis og clichea ajd absurdities thag seems positivey decadet in its cinematic lash ande mptiness .	positive (p = 0.95)	negative (p = 0.99)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Robustness	medium 🟡	—	Fail rate = 0.059	Punctuation Removal	51/866 tested samples (5.89%) changed prediction after perturbation

🔍✨Examples

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 5.89% of the cases. We expected the predictions not to be affected by this transformation.

	text	Punctuation Removal(text)	Original prediction	Prediction after perturbation
4	it 's slow -- very , very slow .	it s slow very very slow	positive (p = 0.52)	negative (p = 0.77)
33	if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world .	if the movie succeeds in instilling a wary sense of there but for the grace of god it is far too self conscious to draw you deeply into its world	negative (p = 1.00)	positive (p = 0.99)
66	if you 're hard up for raunchy college humor , this is your ticket right here .	if you re hard up for raunchy college humor this is your ticket right here	positive (p = 0.89)	negative (p = 0.57)

👉Performance issues (10)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`text_length(text)` < 89.500 AND `text_length(text)` >= 80.500	Precision = 0.719	—	-15.79% than global

🔍✨Examples

For records in the dataset where `text_length(text)` < 89.500 AND `text_length(text)` >= 80.500, the Precision is 15.79% lower than the global Precision.

	text	text_length(text)	label	Predicted `label`
115	sam mendes has become valedictorian at the school for soft landings and easy ways out .	88	negative	positive (p = 0.95)
142	what better message than ` love thyself ' could young women of any size receive ?	82	positive	negative (p = 1.00)
286	at its best , queen is campy fun like the vincent price horror classics of the '60s .	86	positive	negative (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`idx` >= 500.500 AND `idx` < 548.500	Accuracy = 0.812	—	-6.90% than global

🔍✨Examples

For records in the dataset where `idx` >= 500.500 AND `idx` < 548.500, the Accuracy is 6.9% lower than the global Accuracy.

	idx	label	Predicted `label`
501	501	positive	negative (p = 1.00)
509	509	positive	negative (p = 0.99)
513	513	negative	positive (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`idx` >= 444.500 AND `idx` < 500.500	Recall = 0.844	—	-6.81% than global

🔍✨Examples

For records in the dataset where `idx` >= 444.500 AND `idx` < 500.500, the Recall is 6.81% lower than the global Recall.

	idx	label	Predicted `label`
445	445	positive	negative (p = 0.67)
446	446	negative	positive (p = 1.00)
447	447	positive	negative (p = 0.91)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_whitespace(text)` < 0.154	Recall = 0.844	—	-6.81% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` < 0.154, the Recall is 6.81% lower than the global Recall.

	text	avg_whitespace(text)	label	Predicted `label`
1	unflinchingly bleak and desperate	0.117647	negative	positive (p = 1.00)
68	good old-fashioned slash-and-hack is back !	0.136364	positive	negative (p = 0.60)
112	hilariously inept and ridiculous .	0.142857	positive	negative (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_word_length(text)` >= 5.511	Recall = 0.844	—	-6.81% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` >= 5.511, the Recall is 6.81% lower than the global Recall.

	text	avg_word_length(text)	label	Predicted `label`
1	unflinchingly bleak and desperate	7.5	negative	positive (p = 1.00)
68	good old-fashioned slash-and-hack is back !	6.33333	positive	negative (p = 0.60)
112	hilariously inept and ridiculous .	6	positive	negative (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`text_length(text)` < 59.500 AND `text_length(text)` >= 50.500	Precision = 0.800	—	-6.27% than global

🔍✨Examples

For records in the dataset where `text_length(text)` < 59.500 AND `text_length(text)` >= 50.500, the Precision is 6.27% lower than the global Precision.

	text	text_length(text)	label	Predicted `label`
139	it 's not the ultimate depression-era gangster movie .	55	negative	positive (p = 0.98)
183	the lower your expectations , the more you 'll enjoy it .	58	negative	positive (p = 0.99)
205	falls neatly into the category of good stupid fun .	52	positive	negative (p = 0.92)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_whitespace(text)` < 0.187 AND `avg_whitespace(text)` >= 0.183	Precision = 0.800	—	-6.27% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` < 0.187 AND `avg_whitespace(text)` >= 0.183, the Precision is 6.27% lower than the global Precision.

	text	avg_whitespace(text)	label	Predicted `label`
86	the film flat lines when it should peak and is more missed opportunity and trifle than dark , decadent truffle .	0.185841	negative	positive (p = 0.93)
147	the talented and clever robert rodriguez perhaps put a little too much heart into his first film and did n't reserve enough for his second .	0.184397	negative	positive (p = 0.97)
448	something akin to a japanese alice through the looking glass , except that it seems to take itself far more seriously .	0.183333	positive	negative (p = 0.84)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_word_length(text)` >= 4.354 AND `avg_word_length(text)` < 4.464	Precision = 0.800	—	-6.27% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` >= 4.354 AND `avg_word_length(text)` < 4.464, the Precision is 6.27% lower than the global Precision.

	text	avg_word_length(text)	label	Predicted `label`
86	the film flat lines when it should peak and is more missed opportunity and trifle than dark , decadent truffle .	4.38095	negative	positive (p = 0.93)
147	the talented and clever robert rodriguez perhaps put a little too much heart into his first film and did n't reserve enough for his second .	4.42308	negative	positive (p = 0.97)
448	something akin to a japanese alice through the looking glass , except that it seems to take itself far more seriously .	4.45455	positive	negative (p = 0.84)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_whitespace(text)` < 0.195 AND `avg_whitespace(text)` >= 0.192	Recall = 0.850	—	-6.12% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` < 0.195 AND `avg_whitespace(text)` >= 0.192, the Recall is 6.12% lower than the global Recall.

	text	avg_whitespace(text)	label	Predicted `label`
113	this movie is maddening .	0.192308	negative	positive (p = 1.00)
121	it seems to me the film is about the art of ripping people off without ever letting them consciously know you have done so	0.195122	negative	positive (p = 0.98)
142	what better message than ` love thyself ' could young women of any size receive ?	0.195122	positive	negative (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_word_length(text)` >= 4.123 AND `avg_word_length(text)` < 4.209	Recall = 0.850	—	-6.12% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` >= 4.123 AND `avg_word_length(text)` < 4.209, the Recall is 6.12% lower than the global Recall.

	text	avg_word_length(text)	label	Predicted `label`
113	this movie is maddening .	4.2	negative	positive (p = 1.00)
121	it seems to me the film is about the art of ripping people off without ever letting them consciously know you have done so	4.125	negative	positive (p = 0.98)
142	what better message than ` love thyself ' could young women of any size receive ?	4.125	positive	negative (p = 1.00)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

Checkout the Giskard Space and improve your model.
The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!