nice, you can join the gliner discord server: https://discord.gg/Y2yVxpSQnG
Urchade Zaratiana
AI & ML interests
Organizations
urchade's activity
Hi @meduri30
Thank you for your interest in GLiNER, I am looking forward for your domain specific version 😀
I have started to work on RE
I have an initial version (Beta) you can try in colab. You can check this repo: https://github.com/urchade/GraphER
For now, the results are now robust but it can work for some domain I think.
I am pleased to announce the release of gliner_multi_pii-v1, a model developed for recognizing a wide range of Personally Identifiable Information (PII). This model is the result of fine-tuning the urchade/gliner_multi-v2.1 on synthetic dataset (urchade/synthetic-pii-ner-mistral-v1).
**Model Features:**
- Capable of identifying multiple PII types including addresses, passport numbers, emails, social security numbers, and more.
- Designed to assist with data protection and compliance across various domains.
- Multilingual (English, French, Spanish, German, Italian, Portugese)
Link: urchade/gliner_multi_pii-v1
from gliner import GLiNER
model = GLiNER.from_pretrained("urchade/gliner_multi_pii-v1")
text = """
Harilala Rasoanaivo, un homme d'affaires local d'Antananarivo, a enregistré une nouvelle société nommée "Rasoanaivo Enterprises" au Lot II M 92 Antohomadinika. Son numéro est le +261 32 22 345 67, et son adresse électronique est [email protected]. Il a fourni son numéro de sécu 501-02-1234 pour l'enregistrement.
"""
labels = ["work", "booking number", "personally identifiable information", "driver licence", "person", "address", "company", "email", "passport number", "Social Security Number", "phone number"]
entities = model.predict_entities(text, labels)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Harilala Rasoanaivo => person
Rasoanaivo Enterprises => company
Lot II M 92 Antohomadinika => full address
+261 32 22 345 67 => phone number
[email protected] => email
501-02-1234 => Social Security Number
You should be able to fine-tuning your own version: https://github.com/urchade/ATG/issues/3
I am also working on a zero-shot end to end relation extraction, which is as efficient
as GLiNER. Stay tuned 🙏
Is there a commercial license available for the multi-lingual model?
🆕 A new commercially permissible multilingual version is available urchade/gliner_multiv2.1
🐛 A subtle bug that causes performance degradation on some models has been corrected. Thanks to @yyDing1 for raising the issue.
from gliner import GLiNER
# Initialize GLiNER
model = GLiNER.from_pretrained("urchade/gliner_multiv2.1")
text = "This is a text about Bill Gates and Microsoft."
# Labels for entity prediction
labels = ["person", "organization", "email"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(entity["text"], "=>", entity["label"])
hi, you can raise an issue here: https://github.com/urchade/GLiNER.git
oh, ok. I forgot to update it, thanks!
yes, I uploaded the weight are hosted on huggingface. It should be visible on my profile :)
I'd like to share our project on open-type Named Entity Recognition (NER). Our model uses a transformer encoder (BERT-like), making the computation overhead very minimal compared to use of LLMs. I've developed a demo that runs on CPU on Google Colab.
Colab Demo: https://colab.research.google.com/drive/1mhalKWzmfSTqMnR0wQBZvt9-ktTsATHB?usp=sharing
Code: https://github.com/urchade/GLiNER
Paper: https://arxiv.org/abs/2311.08526