matthewleechen
/

patent_titles_ner

Token Classification

Model card Files Files and versions Community

matthewleechen commited on Jan 11

Commit

b8dc6de

·

verified ·

1 Parent(s): e7be769

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -119,6 +119,12 @@ The custom dataset of front page texts of patent specifications was assembled in
 Our custom dataset has accurate manual labels generated by a graduate student. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
 ### Evaluation
 Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.

 Our custom dataset has accurate manual labels generated by a graduate student. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
+### Training Procedure
+We use the standard token classification protocols with the HuggingFace Trainer API. We use cross-entropy loss.
 ### Evaluation
 Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.