RoBERTa for Single Language Classification
Training
RoBERTa fine-tuned on small parts of Open Subtitles, Oscar and Tatoeba datasets (~9k samples per language).
data source | language |
---|---|
open_subtitles | ka, he, en, de |
oscar | be, kk, az, hu |
tatoeba | ru, uk |
Validation
The metrics obtained from validation on the another part of dataset (~1k samples per language).
index | class | f1-score | precision | recall | support |
---|---|---|---|---|---|
0 | az | 0.998 | 0.997 | 1.0 | 997 |
1 | be | 0.996 | 0.998 | 0.994 | 1004 |
2 | de | 0.976 | 0.966 | 0.987 | 979 |
3 | en | 0.976 | 0.986 | 0.967 | 1020 |
4 | he | 1.0 | 1.0 | 0.999 | 1001 |
5 | hy | 0.994 | 0.991 | 0.998 | 993 |
6 | ka | 0.999 | 0.999 | 0.999 | 1000 |
7 | kk | 0.996 | 0.998 | 0.993 | 1005 |
8 | uk | 0.982 | 0.997 | 0.968 | 1030 |
9 | ru | 0.982 | 0.968 | 0.997 | 971 |
10 | macro_avg | 0.99 | 0.99 | 0.99 | 10000 |
11 | weighted avg | 0.99 | 0.99 | 0.99 | 10000 |
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.