matthewleechen commited on
Commit
4e6cb1a
·
verified ·
1 Parent(s): f40382c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -128,6 +128,10 @@ The custom dataset of front page texts of patent specifications was assembled in
128
 
129
  Our custom dataset has accurate manual labels created jointly by an undergraduate student and an economics professor. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
130
 
 
 
 
 
131
  ### Evaluation
132
 
133
  Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.
 
128
 
129
  Our custom dataset has accurate manual labels created jointly by an undergraduate student and an economics professor. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
130
 
131
+ ### Training Procedure
132
+
133
+ We use the standard token classification protocols with the HuggingFace Trainer API. We use cross-entropy loss.
134
+
135
  ### Evaluation
136
 
137
  Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.