metadata
license: mit
datasets:
- agentlans/tatoeba-english-translations
base_model:
- microsoft/mdeberta-v3-base
pipeline_tag: text-classification
tags:
- multilingual
- quality-assessment
DeBERTa V3 Base for Multilingual Quality Assessment
This is a fine-tuned version of the multilingual DeBERTa model (mdeberta) for assessing text quality across languages.
Model Details
- Architecture: mdeberta-v3-base-quality
- Task: Regression (Quality Assessment)
- Training Data: agentlans/tatoeba-english-translations dataset containing 39 100 English translations
- Input: Text in any of the supported languages by DeBERTa
- Output: Estimated quality score for text
- higher values indicate better text
Performance
Root mean squared error (RMSE) on 20% held-out validation set: 0.5036
Training Data
The model was trained on agentlans/tatoeba-english-translations.
Usage
Limitations
- Performance may vary for texts significantly different from the training data
- Output is based on statistical patterns and may not always align with human judgment
- Quality is assessed purely on textual features, not considering factors like subject familiarity or cultural context
Ethical Considerations
- Should not be used as the sole determinant of text suitability for specific audiences
- Results may reflect biases present in the training data sources
- Care should be taken when using these models in educational or publishing contexts