nice readme
Browse files
README.md
CHANGED
@@ -18,12 +18,30 @@ datasets:
|
|
18 |
- oscar
|
19 |
---
|
20 |
|
21 |
-
|
|
|
|
|
22 |
|
23 |
-
|
|
|
|
|
|
|
|
|
24 |
|
25 |
-
|
|
|
26 |
|
27 |
-
|
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
- oscar
|
19 |
---
|
20 |
|
21 |
+
# RoBERTa for Single Language Classification
|
22 |
+
## Training
|
23 |
+
RoBERTa fine-tuned on small parts of Open Subtitles, Oscar and Tatoeba datasets (~9k samples per language).
|
24 |
|
25 |
+
| data source | language |
|
26 |
+
|-----------------|----------------|
|
27 |
+
| open_subtitles | ka, he, en, de |
|
28 |
+
| oscar | be, kk, az, hu |
|
29 |
+
| tatoeba | ru, uk |
|
30 |
|
31 |
+
## Validation
|
32 |
+
The metrics obtained from validation on the another part of dataset (~1k samples per language).
|
33 |
|
34 |
+
|index|class|f1-score|precision|recall|support|
|
35 |
+
|---|---|---|---|---|---|
|
36 |
+
|0|az|0\.998|0\.997|1\.0|997|
|
37 |
+
|1|be|0\.996|0\.998|0\.994|1004|
|
38 |
+
|2|de|0\.976|0\.966|0\.987|979|
|
39 |
+
|3|en|0\.976|0\.986|0\.967|1020|
|
40 |
+
|4|he|1\.0|1\.0|0\.999|1001|
|
41 |
+
|5|hy|0\.994|0\.991|0\.998|993|
|
42 |
+
|6|ka|0\.999|0\.999|0\.999|1000|
|
43 |
+
|7|kk|0\.996|0\.998|0\.993|1005|
|
44 |
+
|8|uk|0\.982|0\.997|0\.968|1030|
|
45 |
+
|9|ru|0\.982|0\.968|0\.997|971|
|
46 |
+
|10|macro\_avg|0\.99|0\.99|0\.99|10000|
|
47 |
+
|11|weighted avg|0\.99|0\.99|0\.99|10000|
|