agentlans's picture
Upload 8 files
00b5d57 verified
|
raw
history blame
1.63 kB
metadata
license: mit
datasets:
  - agentlans/tatoeba-english-translations
base_model:
  - microsoft/mdeberta-v3-base
pipeline_tag: text-classification
tags:
  - multilingual
  - quality-assessment

DeBERTa V3 Base for Multilingual Quality Assessment

This is a fine-tuned version of the multilingual DeBERTa model (mdeberta) for assessing text quality across languages.

Model Details

  • Architecture: mdeberta-v3-base-quality
  • Task: Regression (Quality Assessment)
  • Training Data: agentlans/tatoeba-english-translations dataset containing 39 100 English translations
  • Input: Text in any of the supported languages by DeBERTa
  • Output: Estimated quality score for text
    • higher values indicate better text

Performance

Root mean squared error (RMSE) on 20% held-out validation set: 0.5036

Training Data

The model was trained on agentlans/tatoeba-english-translations.

Usage

Limitations

  • Performance may vary for texts significantly different from the training data
  • Output is based on statistical patterns and may not always align with human judgment
  • Quality is assessed purely on textual features, not considering factors like subject familiarity or cultural context

Ethical Considerations

  • Should not be used as the sole determinant of text suitability for specific audiences
  • Results may reflect biases present in the training data sources
  • Care should be taken when using these models in educational or publishing contexts