matthewleechen commited on
Commit
b8dc6de
·
verified ·
1 Parent(s): e7be769

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -119,6 +119,12 @@ The custom dataset of front page texts of patent specifications was assembled in
119
 
120
  Our custom dataset has accurate manual labels generated by a graduate student. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
121
 
 
 
 
 
 
 
122
  ### Evaluation
123
 
124
  Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.
 
119
 
120
  Our custom dataset has accurate manual labels generated by a graduate student. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
121
 
122
+
123
+ ### Training Procedure
124
+
125
+ We use the standard token classification protocols with the HuggingFace Trainer API. We use cross-entropy loss.
126
+
127
+
128
  ### Evaluation
129
 
130
  Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.