BSC-LT
/

salamandra-7b-instruct

@@ -620,26 +620,21 @@ This instruction-tuned variant has been trained with a mixture of 276k English,
 We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,
 we test performance using the BBQ dataset (Parrish et al., 2022) in the original English and the Regard dataset (Sheng et al., 2019).
-We report that while performance is high (accuracies between 0.69 and 0.87 depending on the social category) in disambiguated settings
-the model performs very poorly in ambiguous settings, which is indicative of the presence of societal biases which need to be addressed in post-training phases.
-We additionally analyse model generations using the Regard dataset and classifier in Catalan, Spanish, and English using backtranslation and manual revision of the
-translations. We find no statistically significant difference in regard between majority and minority groups for any regard types,
-with the exception of negative regard in Catalan where model generations are actually slightly worse for social majorities.
-Our analyses on societal biases show that while these biases are capable of interfering with model performance as expressed in the results on the BBQ dataset,
-their tendency for representational harm is limited given the results of the Regard dataset. We highlight that our analyses of these biases are by no means exhaustive
-and are limited by the relative scarcity of adequate resources in all languages present in the training data. We aim to gradually extend and expand our analyses
-in future work.
-Our cognitive bias analysis focuses on positional effects in 0-shot settings, and majority class bias in few-shot settings.
-For positional effects, we leverage the ARC Multiple Choice Question dataset (Clark et al., 2018).
-We observe moderate to strong primacy effects, whereby the model shows a preference for answers towards the beginning of the list of provided answers.
-We measure effects of majority class effects in few-shot settings using SST-2 (Socher et al., 2013). We detect moderate effects,
-implying that outputs can be influenced by the prompts.
-We highlight that these results can be expected from a pretrained model that has not yet been instruction-tuned or aligned.
-These tests are performed in order to show the biases the model may contain.
-We urge developers to take them into account and perform safety testing and tuning tailored to their specific applications of the model.
 ---

 We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,
 we test performance using the BBQ dataset (Parrish et al., 2022) in the original English and the Regard dataset (Sheng et al., 2019).
+We report that while performance is high (accuracies around 0.8 depending on the social category) in disambiguated settings,
+the model performs very poorly in ambiguous settings, which indicates the presence of societal biases that need to be further addressed in post-training phases.
+Our cognitive bias analysis focuses on positional effects in 0-shot settings, and majority class bias in few-shot settings.
+For positional effects, we leverage the ARC Multiple Choice Question dataset (Clark et al., 2018). We observe significant,
+but relatively weak primacy effects, whereby the model shows a preference for answers towards the beginning of the list of provided answers.
+We measure effects of majority class effects in few-shot settings using SST-2 (Socher et al., 2013). We again detect significant effects,
+with a small effect size. This suggests that the model is relatively robust against the examined cognitive biases.
+ We highlight that our analyses of these biases are by no means exhaustive and are limited by the relative scarcity of adequate resources
+ in all languages present in the training data. We aim to gradually extend and expand our analyses in future work.
+ These results can be expected from a  model that has undergone only a preliminary instruction tuning.
+ These tests are performed in order to show the biases the model may contain. We urge developers to take
+ them into account and perform safety testing and tuning tailored to their specific applications of the model.
 ---