Update README.md
Browse files
README.md
CHANGED
@@ -119,6 +119,12 @@ The custom dataset of front page texts of patent specifications was assembled in
|
|
119 |
|
120 |
Our custom dataset has accurate manual labels generated by a graduate student. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
|
121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
### Evaluation
|
123 |
|
124 |
Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.
|
|
|
119 |
|
120 |
Our custom dataset has accurate manual labels generated by a graduate student. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
|
121 |
|
122 |
+
|
123 |
+
### Training Procedure
|
124 |
+
|
125 |
+
We use the standard token classification protocols with the HuggingFace Trainer API. We use cross-entropy loss.
|
126 |
+
|
127 |
+
|
128 |
### Evaluation
|
129 |
|
130 |
Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.
|