yeshpanovrustem
/

xlm-roberta-large-ner-kazakh

Token Classification

Named Entity Recognition

Inference Endpoints

Model card Files Files and versions Community

yeshpanovrustem commited on May 20, 2023

Commit

ce2a5c9

·

1 Parent(s): faa072f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ license: cc-by-4.0
 - The original repository for the paper can be found at *https://github.com/IS2AI/KazNERD*.
 ## Differences
 While the original dataset contained tokens denoting speech disfluencies and hesitations (parenthesised) and background noise [bracketed], this model was trained on a version of the dataset where such tokens were removed.
-Removing the tokens caused some changes in the number of sentences, tokens, and named entities (NEs).
 Dataset | Unit | Train | Valid | Test | Total |
 | :---: | :---: | :---: | :---: | :---: | :---: |

 - The original repository for the paper can be found at *https://github.com/IS2AI/KazNERD*.
 ## Differences
 While the original dataset contained tokens denoting speech disfluencies and hesitations (parenthesised) and background noise [bracketed], this model was trained on a version of the dataset where such tokens were removed.
+As a result, the number of sentences, tokens, and named entities (NEs) in the cleaned dataset changed. It is also likely that token numbers were calculated incorrectly in the original dataset and should have been given as 1,120,387 (Train), 136,983 (Valid), 134,540 (Test), and 1,391,910 (Total).
 Dataset | Unit | Train | Valid | Test | Total |
 | :---: | :---: | :---: | :---: | :---: | :---: |