File size: 1,626 Bytes
00b5d57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: mit
datasets:
- agentlans/tatoeba-english-translations
base_model:
- microsoft/mdeberta-v3-base
pipeline_tag: text-classification
tags:
- multilingual
- quality-assessment
---
# DeBERTa V3 Base for Multilingual Quality Assessment

This is a fine-tuned version of the multilingual DeBERTa model (mdeberta) for assessing text quality across languages.

## Model Details

- **Architecture:** mdeberta-v3-base-quality
- **Task:** Regression (Quality Assessment)
- **Training Data:** [agentlans/tatoeba-english-translations](https://huggingface.co/datasets/agentlans/tatoeba-english-translations/) dataset containing 39 100 English translations
- **Input:** Text in any of the supported languages by DeBERTa
- **Output:** Estimated quality score for text
  - higher values indicate better text

## Performance

Root mean squared error (RMSE) on 20% held-out validation set: 0.5036

## Training Data

The model was trained on [agentlans/tatoeba-english-translations](https://huggingface.co/datasets/agentlans/tatoeba-english-translations).

## Usage

## Limitations

- Performance may vary for texts significantly different from the training data
- Output is based on statistical patterns and may not always align with human judgment
- Quality is assessed purely on textual features, not considering factors like subject familiarity or cultural context

## Ethical Considerations

- Should not be used as the sole determinant of text suitability for specific audiences
- Results may reflect biases present in the training data sources
- Care should be taken when using these models in educational or publishing contexts