|
--- |
|
license: apache-2.0 |
|
language: |
|
- it |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- DeepMount00/ModernBERT-base-ita |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Text Quality Classifier (Binary) |
|
|
|
This model aim to classify the general quality and educational content of a given text. The available labels are 'LABEL_0' that means **bad quality** and 'LABEL_1' that means **good quality**. |
|
It can be used to efficiently filter by quality huge quantity of raw text. Useful for creating pretraining italian datasets. |
|
The model tend to classify as "good quality" wikipedia-like texts, containing educational, well structured and explained text. |
|
|
|
## How to get access |
|
This is a private model, but if you want to get access explain us how you're going to use this model at <a href="mailto:[email protected]">[email protected]</a> |
|
|
|
|
|
## Eval |
|
|
|
Durante la fase di valutazione, il modello ha ottenuto le seguenti metriche: |
|
|
|
* **Eval Loss:** 0.3422 |
|
* **Accuracy:** 0.8607 |
|
* **F1-Score:** 0.8597 |
|
|
|
## How to use |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
MODEL = "ReDiX/text-quality-classifier-ita" |
|
pipe = pipeline("text-classification", model=MODEL, tokenizer=MODEL) |
|
|
|
example_text = "Questo è un testo di esempio in italiano per la classificazione." |
|
result = pipe(example_text) |
|
print(f"TEXT: '{example_text}'") |
|
print(f"RESULT: {result}") |
|
``` |
|
|
|
# Eval |
|
|
|
 |
|
|