update bias section
Browse files
README.md
CHANGED
@@ -620,26 +620,21 @@ This instruction-tuned variant has been trained with a mixture of 276k English,
|
|
620 |
|
621 |
We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,
|
622 |
we test performance using the BBQ dataset (Parrish et al., 2022) in the original English and the Regard dataset (Sheng et al., 2019).
|
623 |
-
We report that while performance is high (accuracies
|
624 |
-
the model performs very poorly in ambiguous settings, which
|
625 |
-
|
626 |
-
|
627 |
-
|
628 |
-
|
629 |
-
|
630 |
-
|
631 |
-
|
632 |
-
|
633 |
-
|
634 |
-
|
635 |
-
|
636 |
-
|
637 |
-
|
638 |
-
implying that outputs can be influenced by the prompts.
|
639 |
-
|
640 |
-
We highlight that these results can be expected from a pretrained model that has not yet been instruction-tuned or aligned.
|
641 |
-
These tests are performed in order to show the biases the model may contain.
|
642 |
-
We urge developers to take them into account and perform safety testing and tuning tailored to their specific applications of the model.
|
643 |
|
644 |
---
|
645 |
|
|
|
620 |
|
621 |
We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,
|
622 |
we test performance using the BBQ dataset (Parrish et al., 2022) in the original English and the Regard dataset (Sheng et al., 2019).
|
623 |
+
We report that while performance is high (accuracies around 0.8 depending on the social category) in disambiguated settings,
|
624 |
+
the model performs very poorly in ambiguous settings, which indicates the presence of societal biases that need to be further addressed in post-training phases.
|
625 |
+
|
626 |
+
Our cognitive bias analysis focuses on positional effects in 0-shot settings, and majority class bias in few-shot settings.
|
627 |
+
For positional effects, we leverage the ARC Multiple Choice Question dataset (Clark et al., 2018). We observe significant,
|
628 |
+
but relatively weak primacy effects, whereby the model shows a preference for answers towards the beginning of the list of provided answers.
|
629 |
+
We measure effects of majority class effects in few-shot settings using SST-2 (Socher et al., 2013). We again detect significant effects,
|
630 |
+
with a small effect size. This suggests that the model is relatively robust against the examined cognitive biases.
|
631 |
+
|
632 |
+
We highlight that our analyses of these biases are by no means exhaustive and are limited by the relative scarcity of adequate resources
|
633 |
+
in all languages present in the training data. We aim to gradually extend and expand our analyses in future work.
|
634 |
+
|
635 |
+
These results can be expected from a model that has undergone only a preliminary instruction tuning.
|
636 |
+
These tests are performed in order to show the biases the model may contain. We urge developers to take
|
637 |
+
them into account and perform safety testing and tuning tailored to their specific applications of the model.
|
|
|
|
|
|
|
|
|
|
|
638 |
|
639 |
---
|
640 |
|