Update README.md
Browse files
README.md
CHANGED
@@ -28,12 +28,14 @@ Benchmark
|
|
28 |
|
29 |
As the scores range from 0 to 1, a performance measure such as MAE or RMSE may be challenging to interpret. Therefore, Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 750 comments not seen during training.
|
30 |
|
31 |
-
| Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) |
|
32 |
-
|
33 |
-
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 62 | 73 | 73 | 68 | 61 |
|
34 |
-
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 63 | 61 | 63 | 67 | 55 |
|
35 |
-
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | Frnech | 72 | 82 | 80 | 78 | 77 |
|
36 |
-
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 76 | 78 | 77 | 75
|
|
|
|
|
37 |
|
38 |
Citation
|
39 |
--------
|
|
|
28 |
|
29 |
As the scores range from 0 to 1, a performance measure such as MAE or RMSE may be challenging to interpret. Therefore, Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 750 comments not seen during training.
|
30 |
|
31 |
+
| Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) | Mean |
|
32 |
+
|-------------------------------------------------------------------------------|----------|:-----------------------:|-------------------------------|-------------------------------|----------------------|----------------------|------|
|
33 |
+
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 62 | 73 | 73 | 68 | 61 | 67 |
|
34 |
+
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 63 | 61 | 63 | 67 | 55 | 62 |
|
35 |
+
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | Frnech | 72 | 82 | 80 | 78 | 77 | 78 |
|
36 |
+
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 76 | 78 | 77 | 75 | 79 | 77 |
|
37 |
+
|
38 |
+
With a correlation of approximately 60 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
|
39 |
|
40 |
Citation
|
41 |
--------
|