matthewleechen
/

patent_entities_ner

Token Classification

Model card Files Files and versions Community

matthewleechen commited on Jan 11

Commit

4e6cb1a

·

verified ·

1 Parent(s): f40382c

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -128,6 +128,10 @@ The custom dataset of front page texts of patent specifications was assembled in
 Our custom dataset has accurate manual labels created jointly by an undergraduate student and an economics professor. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
 ### Evaluation
 Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.

 Our custom dataset has accurate manual labels created jointly by an undergraduate student and an economics professor. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
+### Training Procedure
+We use the standard token classification protocols with the HuggingFace Trainer API. We use cross-entropy loss.
 ### Evaluation
 Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.