NuNER_Zero-4k / README.md
Serega6678's picture
Update README.md
57b2d1b
|
raw
history blame
2.41 kB
---
license: mit
datasets:
- numind/NuNER
library_name: gliner
language:
- en
pipeline_tag: token-classification
tags:
- entity recognition
- NER
- named entity recognition
- zero shot
- zero-shot
---
NuNerZero - is the family of Zero-Shot Entity Recognition models inspired by [GLiNER](https://huggingface.co/papers/2311.08526) and built with insights we gathered throughout our work on [NuNER](https://huggingface.co/collections/numind/nuner-token-classification-and-ner-backbones-65e1f6e14639e2a465af823b).
The key differences between NuNerZero Token Long in comparison to GLiNER are:
* **4096 context window!** vs 512-token context in GLiNER. This allows processing a page at a time vs a few sentences!
* The possibility to **detect entities that are longer than 12 tokens**, as NuNerZero, it operates on the token level rather than on the span level.
* NuZero family is trained on the **diverse dataset tailored for real-life use cases** - NuNER v2.0 dataset
<p align="center">
<img src="zero_shot_performance_unzero_token_long.png">
</p>
## Installation & Usage
```
!pip install gliner
```
**NuZero requires labels to be lower-cased**
```python
from gliner import GLiNER
model = GLiNER.from_pretrained("numind/NuNerZero_long_contex")
# NuZero requires labels to be lower-cased!
labels = ["person", "award", "date", "competitions", "teams"]
labels [l.lower() for l in labels]
text = """
"""
entities = model.predict_entities(text, labels)
for entity in entities:
print(entity["text"], "=>", entity["label"])
```
## Fine-tuning
A fine-tuning script can be found [here](https://colab.research.google.com/drive/19WDnuD2U-B0h-FzX7I5FySNP6sHt4Cru?usp=sharing).
## Citation
### This work
```bibtex
@misc{bogdanov2024nuner,
title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data},
author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard},
year={2024},
eprint={2402.15343},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Previous work
```bibtex
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```