Feature |
Description |
Name |
en_student_name_detector |
Version |
0.0.1 |
spaCy |
>=3.4.1,<3.5.0 |
Default Pipeline |
transformer , ner |
Components |
transformer , ner |
Sources |
longformer |
License |
Apache 2.0 |
Author |
Langdon Holmes |
Label Scheme
View label scheme (1 labels for 1 components)
Component |
Labels |
ner |
STUDENT |
Accuracy
Type |
Score |
ENTS_F |
83.66 |
ENTS_P |
83.12 |
ENTS_R |
84.21 |
TRANSFORMER_LOSS |
56255026.35 |
NER_LOSS |
31154.89 |
Training Data
6,293 student writing assignments were submitted as PDF files. All documents were reflection assignments in response to the same prompt in the same online course. Student names were labeled by human raters (one rater per document). A preliminary model was trained and all disagreements between this model and the human annotations were adjudicated by two additional reviewers. The training dataset includes all 6,293 documents, 845 of which include student names. There are 1,155 student name annotations in total.
To Use
This model has been packaged using spaCy. It is available as a huggingface model or a pip package. Performance of the model should be evaluated on in-domain data before deployment in production, particularly when confidential information is involved.