colette-exe
/

distilbert-finetuned-ner-for-articles

Token Classification

Generated from Trainer

Model card Files Files and versions

colette-exe commited on Apr 30, 2024

Commit

55da8b9

·

verified ·

1 Parent(s): c15ac55

Update README.md

Files changed (1) hide show

README.md +31 -3

README.md CHANGED Viewed

@@ -11,6 +11,9 @@ metrics:
 model-index:
 - name: distilbert-finetuned-ner-for-articles
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -28,11 +31,36 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
@@ -67,4 +95,4 @@ The following hyperparameters were used during training:
 - Transformers 4.40.1
 - Pytorch 2.2.1+cu121
 - Datasets 2.19.0
-- Tokenizers 0.19.1

 model-index:
 - name: distilbert-finetuned-ner-for-articles
   results: []
+language:
+- en
+library_name: transformers
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 ## Model description
+Distilbert finetuned for detecting crime, accidents, and natural disaster occurrences.
+Tags (IOBES/BIOES tagging format):
+- O: not an entity
+- S-CRIME
+- S-CRIMINAL
+- S-VICTIM
+- S-SUSPECT
+- S-TIMEDATE: date with month, day, year, either one, two, or all of them together
+- S-TIMEWORD: words signifying time (last, weekend, earlier, week, today, etc.)
+- S-TIMEDAY: days of the week
+- S-TIMEDAYPART: morning, afternoon, evening, night
+- S-TIMENUM: 4:31, 6:30, etc.
+- S-TIMEMISC: New Year, Christmas, etc.
+- S-LOC: location word (mentioned alone)
+- B-LOC: beginning (part of a series of location names mentioned)
+- I-LOC: inside
+- E-LOC: end (the last location word specified)
+- S-LOCWORD: junction, island, street, etc.
+- S-LOCDIR: north, south, etc.
+- S-ACCIDENT
+- S-NATDISAS: type of natural disaster
+- S-OTHEROCC: other occurrences (not really labeled much in the dataset)
+Dataset used is of size 502, manually annotated the dataset from the paper "MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification" using Doccano (a free NER annotation tool).
 ## Intended uses & limitations
+- Needs a bigger dataset.
+- More training is highly recommended.
 ## Training and evaluation data
 - Transformers 4.40.1
 - Pytorch 2.2.1+cu121
 - Datasets 2.19.0
+- Tokenizers 0.19.1