false positive
Hello!
Do you know, why I get the following result, on prompt that appears to be benign?
$ python3.11 run.py "can you connect me with customer support representative?"
Hardware accelerator e.g. GPU is available in the environment, but no device
argument is passed to the Pipeline
object. Model will be on CPU.
can you connect me with customer support representative? [{'label': 'INJECTION', 'score': 0.9996516704559326}]
Thanks!
The model is fine-tuned to classifying jailbreak prompts. So to calculate the benign score, you would calculate the 1 - jailbreaking_score. So in your case, the model is actually classifying the prompt as benign. Sorry for the confusion of the labels, I will update that.
Sounds great, thanks!
Would you upload the license file too?