--- library_name: transformers tags: - bert - ner license: apache-2.0 datasets: - eriktks/conll2003 base_model: - google-bert/bert-base-uncased pipeline_tag: token-classification language: - en results: - task: type: token-classification name: Token Classification dataset: name: conll2003 type: conll2003 config: conll2003 split: test metrics: - name: Precision type: precision value: 0.8992 verified: true - name: Recall type: recall value: 0.9115 verified: true - name: F1 type: f1 value: 0.0.9053 verified: true - name: loss type: loss value: 0.040937 verified: true --- # Model Card for Bert Named Entity Recognition ### Model Description This is a chat fine-tuned version of `google-bert/bert-base-uncased`, designed to perform Named Entity Recognition on a text sentence imput. - **Developed by:** [Sartaj](https://huggingface.co/sartajbhuvaji) - **Finetuned from model:** `google-bert/bert-base-uncased` - **Language(s):** English - **License:** apache-2.0 - **Framework:** Hugging Face Transformers ### Model Sources - **Repository:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) - **Paper:** [BERT-paper](https://huggingface.co/papers/1810.04805) ## Uses Model can be used to recognize Named Entities in text. ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline tokenizer = AutoTokenizer.from_pretrained("sartajbhuvaji/bert-named-entity-recognition") model = AutoModelForTokenClassification.from_pretrained("sartajbhuvaji/bert-named-entity-recognition") nlp = pipeline("ner", model=model, tokenizer=tokenizer) example = "My name is Wolfgang and I live in Berlin" ner_results = nlp(example) print(ner_results) ``` ```json [ { "end": 19, "entity": "B-PER", "index": 4, "score": 0.99633455, "start": 11, "word": "wolfgang" }, { "end": 40, "entity": "B-LOC", "index": 9, "score": 0.9987465, "start": 34, "word": "berlin" } ] ``` ## Training Details - **Dataset** : [eriktks/conll2003](https://huggingface.co/datasets/eriktks/conll2003) | Abbreviation | Description | |---|---| | O | Outside of a named entity | | B-MISC | Beginning of a miscellaneous entity right after another miscellaneous entity | | I-MISC | Miscellaneous entity | | B-PER | Beginning of a person's name right after another person's name | | I-PER | Person's name | | B-ORG | Beginning of an organization right after another organization | | I-ORG | Organization | | B-LOC | Beginning of a location right after another location | | I-LOC | Location | ### Training Procedure - Full Model Finetune - Epochs : 5 #### Training Loss Curves ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6354695712edd0ed5dc46b04/vVra4giLk3EPjXo48Sbax.png) ## Trainer - global_step: 4390 - training_loss: 0.040937909830132485 - train_runtime: 206.3611 - train_samples_per_second: 340.205 - train_steps_per_second: 21.273 - total_flos: 1702317283240608.0 - train_loss: 0.040937909830132485 - epoch: 5.0 ## Evaluation - Precision: 0.8992 - Recall: 0.9115 - F1 Score: 0.9053 ### Classification Report | Class | Precision | Recall | F1-Score | Support | |---|---|---|---|---| | LOC | 0.91 | 0.93 | 0.92 | 1668 | | MISC | 0.76 | 0.81 | 0.78 | 702 | | ORG | 0.87 | 0.88 | 0.88 | 1661 | | PER | 0.98 | 0.97 | 0.97 | 1617 | | **Micro Avg** | 0.90 | 0.91 | 0.91 | 5648 | | **Macro Avg** | 0.88 | 0.90 | 0.89 | 5648 | | **Weighted Avg** | 0.90 | 0.91 | 0.91 | 5648 | - Evaluation Dataset : eriktks/conll2003