--- license: mit datasets: - numind/NuNER library_name: gliner language: - en pipeline_tag: token-classification tags: - entity recognition - NER - named entity recognition - zero shot - zero-shot --- NuNerZero - is the family of Zero-Shot Entity Recognition models inspired by [GLiNER](https://huggingface.co/papers/2311.08526) and built with insights we gathered throughout our work on [NuNER](https://huggingface.co/collections/numind/nuner-token-classification-and-ner-backbones-65e1f6e14639e2a465af823b). The key differences between NuNerZero Token Long in comparison to GLiNER are: * **4096 context window!** vs 512-token context in GLiNER. This allows processing a page at a time vs a few sentences! * The possibility to **detect entities that are longer than 12 tokens**, as NuNerZero, it operates on the token level rather than on the span level. * NuZero family is trained on the **diverse dataset tailored for real-life use cases** - NuNER v2.0 dataset
## Installation & Usage ``` !pip install gliner ``` **NuZero requires labels to be lower-cased** ```python from gliner import GLiNER model = GLiNER.from_pretrained("numind/NuNerZero_long_contex") # NuZero requires labels to be lower-cased! labels = ["person", "award", "date", "competitions", "teams"] labels [l.lower() for l in labels] text = """ """ entities = model.predict_entities(text, labels) for entity in entities: print(entity["text"], "=>", entity["label"]) ``` ## Fine-tuning A fine-tuning script can be found [here](https://colab.research.google.com/drive/19WDnuD2U-B0h-FzX7I5FySNP6sHt4Cru?usp=sharing). ## Citation ### This work ```bibtex @misc{bogdanov2024nuner, title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data}, author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard}, year={2024}, eprint={2402.15343}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ### Previous work ```bibtex @misc{zaratiana2023gliner, title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer}, author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois}, year={2023}, eprint={2311.08526}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```