Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ model-index:
|
|
13 |
|
14 |
## Model Overview
|
15 |
|
16 |
-
This model is a fine-tuned version of [agentlans/deberta-v3-xsmall-zyda-2](https://huggingface.co/agentlans/deberta-v3-xsmall-zyda-2) designed for text quality assessment. It
|
17 |
|
18 |
- Loss: 0.3165
|
19 |
- MSE: 0.3165
|
@@ -24,16 +24,6 @@ The model was trained on the [Text Quality Meta-Analysis Dataset](https://huggin
|
|
24 |
|
25 |
In this context, "quality" refers to legible English sentences that are not spam and contain useful information. It does not necessarily indicate grammatical or factual correctness.
|
26 |
|
27 |
-
### Quality Score Derivation
|
28 |
-
|
29 |
-
The composite quality score was derived through the following steps:
|
30 |
-
|
31 |
-
1. Principal Component Analysis (PCA) was performed on the normalized "fineweb" and "nvidia" scores.
|
32 |
-
2. The first principal component was extracted as an initial measure of quality.
|
33 |
-
3. This quality measure was then adjusted for sentence length using robust linear regression (rlm function from the MASS package).
|
34 |
-
4. The adjusted quality scores were scaled to z-scores to produce the final quality metric.
|
35 |
-
5. The scores were then quantile normalized to a normal distribution.
|
36 |
-
|
37 |
## Model Description
|
38 |
|
39 |
The model is based on the DeBERTa-v3-xsmall architecture and has been fine-tuned for sequence classification tasks, specifically for assessing the quality of text inputs.
|
@@ -44,7 +34,7 @@ This model is intended for evaluating the quality of text inputs. It can be used
|
|
44 |
|
45 |
### Usage Example
|
46 |
|
47 |
-
```
|
48 |
import torch
|
49 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
50 |
|
|
|
13 |
|
14 |
## Model Overview
|
15 |
|
16 |
+
This model is a fine-tuned version of [agentlans/deberta-v3-xsmall-zyda-2](https://huggingface.co/agentlans/deberta-v3-xsmall-zyda-2) designed for text quality assessment. It achieves the following results on the evaluation set:
|
17 |
|
18 |
- Loss: 0.3165
|
19 |
- MSE: 0.3165
|
|
|
24 |
|
25 |
In this context, "quality" refers to legible English sentences that are not spam and contain useful information. It does not necessarily indicate grammatical or factual correctness.
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
## Model Description
|
28 |
|
29 |
The model is based on the DeBERTa-v3-xsmall architecture and has been fine-tuned for sequence classification tasks, specifically for assessing the quality of text inputs.
|
|
|
34 |
|
35 |
### Usage Example
|
36 |
|
37 |
+
```python
|
38 |
import torch
|
39 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
40 |
|