Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,61 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- eriktks/conll2003
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
metrics:
|
8 |
+
- accuracy
|
9 |
+
- precision
|
10 |
+
- recall
|
11 |
+
- f1
|
12 |
+
base_model:
|
13 |
+
- distilbert/distilbert-base-cased
|
14 |
+
---
|
15 |
+
|
16 |
+
# DistilBERT Base Cased Fine-Tuned on CoNLL2003 for English Named Entity Recognition (NER)
|
17 |
+
|
18 |
+
This model is a fine-tuned version of [DistilBERT-base-cased](https://huggingface.co/distilbert/distilbert-base-cased) on the [CoNLL2003](https://huggingface.co/datasets/eriktks/conll2003) dataset for Named Entity Recognition (NER) in English. The CoNLL2003 dataset contains four types of named entities: Person (PER), Location (LOC), Organization (ORG), and Miscellaneous (MISC).
|
19 |
+
|
20 |
+
## Model Details
|
21 |
+
- Model Architecture: BERT (Bidirectional Encoder Representations from Transformers)
|
22 |
+
- Pre-trained Base Model: bert-base-cased
|
23 |
+
- Dataset: CoNLL2003 (NER task)
|
24 |
+
- Languages: English
|
25 |
+
- Fine-tuned for: Named Entity Recognition (NER)
|
26 |
+
- Entities recognized:
|
27 |
+
- PER: Person
|
28 |
+
- LOC: Location
|
29 |
+
- ORG: Organization
|
30 |
+
- MISC: Miscellaneous entities
|
31 |
+
|
32 |
+
## Use Cases
|
33 |
+
This model is ideal for tasks that require identifying and classifying named entities within English text, such as:
|
34 |
+
|
35 |
+
- Information extraction from unstructured text
|
36 |
+
- Content classification and tagging
|
37 |
+
- Automated text summarization
|
38 |
+
- Question answering systems with a focus on entity recognition
|
39 |
+
|
40 |
+
## How to Use
|
41 |
+
To use this model in your code, you can load it via Hugging Face’s Transformers library:
|
42 |
+
|
43 |
+
```python
|
44 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
45 |
+
from transformers import pipeline
|
46 |
+
|
47 |
+
tokenizer = AutoTokenizer.from_pretrained("MrRobson9/distilbert-base-cased-finetuned-conll2003-english-ner")
|
48 |
+
model = AutoModelForTokenClassification.from_pretrained("MrRobson9/distilbert-base-cased-finetuned-conll2003-english-ner")
|
49 |
+
|
50 |
+
nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer)
|
51 |
+
result = nlp_ner("John lives in New York and works for the United Nations.")
|
52 |
+
print(result)
|
53 |
+
```
|
54 |
+
|
55 |
+
## Performance
|
56 |
+
|accuracy |precision |recall |f1-score|
|
57 |
+
|:-------:|:--------:|:-----:|:------:|
|
58 |
+
| 0.987 | 0.937 | 0.941 | 0.939 |
|
59 |
+
|
60 |
+
## License
|
61 |
+
This model is licensed under the same terms as the BERT-base-cased model and the CoNLL2003 dataset. Please ensure compliance with all respective licenses when using this model.
|