Update README.md
Browse files
README.md
CHANGED
@@ -128,6 +128,10 @@ The custom dataset of front page texts of patent specifications was assembled in
|
|
128 |
|
129 |
Our custom dataset has accurate manual labels created jointly by an undergraduate student and an economics professor. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
|
130 |
|
|
|
|
|
|
|
|
|
131 |
### Evaluation
|
132 |
|
133 |
Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.
|
|
|
128 |
|
129 |
Our custom dataset has accurate manual labels created jointly by an undergraduate student and an economics professor. The final dataset is split 60-20-20 (train-val-test). In the event that the front page text is too long, we restrict the text to the first 512 tokens.
|
130 |
|
131 |
+
### Training Procedure
|
132 |
+
|
133 |
+
We use the standard token classification protocols with the HuggingFace Trainer API. We use cross-entropy loss.
|
134 |
+
|
135 |
### Evaluation
|
136 |
|
137 |
Our evaluation metric is F1 at the full entity-level. That is, we aggregated adjacent-indexed entities into full entities and computed F1 scores requiring an exact match. These scores for the test set are below.
|