Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ Where sigma is the sigmoid function and O represents the set of learning observa
|
|
32 |
Benchmark
|
33 |
---------
|
34 |
|
35 |
-
As the scores range from 0 to 1, a performance measure such as
|
36 |
|
37 |
| Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) | Mean |
|
38 |
|-------------------------------------------------------------------------------|----------|:-----------------------:|-------------------------------|-------------------------------|----------------------|----------------------|------|
|
@@ -43,6 +43,15 @@ As the scores range from 0 to 1, a performance measure such as MAE or RMSE may b
|
|
43 |
|
44 |
With a correlation of approximately 65 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
|
45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
How to Use Blommz-560m-guardrail
|
47 |
--------------------------------
|
48 |
|
|
|
32 |
Benchmark
|
33 |
---------
|
34 |
|
35 |
+
As the scores range from 0 to 1, a performance measure such as RMSE may be challenging to interpret. Therefore, Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 730 comments not seen during training.
|
36 |
|
37 |
| Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) | Mean |
|
38 |
|-------------------------------------------------------------------------------|----------|:-----------------------:|-------------------------------|-------------------------------|----------------------|----------------------|------|
|
|
|
43 |
|
44 |
With a correlation of approximately 65 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
|
45 |
|
46 |
+
Now we will focus on the MAE (Mean Absolute Error) score to measure the average gap of the estimation error.
|
47 |
+
|
48 |
+
| Model | Language | Obsecene | Sexual explicit | Identity attack | Insult | Threat | Mean |
|
49 |
+
|-------------------------------------------------------------------------------|----------|:----------------:|-----------------------|----------------------|--------------|------------|------|
|
50 |
+
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 0.06 | 0.03 | 0.03 | 0.13 | 0.04 | 0.06 |
|
51 |
+
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 0.06 | 0.03 | 0.03 | 0.14 | 0.04 | 0.06 |
|
52 |
+
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French | 0.05 | 0.02 | 0.02 | 0.11 | 0.03 | 0.05 |
|
53 |
+
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 0.05 | 0.03 | 0.02 | 0.12 | 0.03 | 0.05 |
|
54 |
+
|
55 |
How to Use Blommz-560m-guardrail
|
56 |
--------------------------------
|
57 |
|