|
--- |
|
datasets: |
|
- priamai/AnnoCTR |
|
base_model: |
|
- urchade/gliner_small-v1 |
|
tags: |
|
- Security |
|
- NER |
|
- CTI |
|
language: |
|
- en |
|
--- |
|
# AITSecNER - Entity Recognition for Cybersecurity |
|
|
|
This repository demonstrates how to use the **AITSecNER** model hosted on Hugging Face, based on the powerful GLiNER library, to extract cybersecurity-related entities from text. |
|
|
|
## Installation |
|
|
|
Install GLiNER via pip: |
|
|
|
```bash |
|
pip install gliner |
|
``` |
|
|
|
## Usage |
|
|
|
### Import and Load Model |
|
|
|
Load the pretrained AITSecNER model directly from Hugging Face: |
|
|
|
```python |
|
from gliner import GLiNER |
|
|
|
model = GLiNER.from_pretrained("selfconstruct3d/AITSecNER", load_tokenizer=True) |
|
``` |
|
|
|
### Predict Entities |
|
|
|
Define the input text and entity labels you wish to extract: |
|
|
|
```python |
|
# Example input text |
|
text = """ |
|
Upon opening Emotet maldocs, victims are greeted with fake Microsoft 365 prompt that states |
|
“THIS DOCUMENT IS PROTECTED,” and instructs victims on how to enable macros. |
|
""" |
|
|
|
# Entity labels |
|
labels = [ |
|
'CLICommand/CodeSnippet', 'CON', 'DATE', 'GROUP', 'LOC', |
|
'MALWARE', 'ORG', 'SECTOR', 'TACTIC', 'TECHNIQUE', 'TOOL' |
|
] |
|
|
|
# Predict entities |
|
entities = model.predict_entities(text, labels, threshold=0.5) |
|
|
|
# Display results |
|
for entity in entities: |
|
print(f"{entity['text']} => {entity['label']}") |
|
``` |
|
|
|
### Sample Output |
|
|
|
```bash |
|
Emotet => MALWARE |
|
Microsoft => ORG |
|
``` |
|
|
|
## Model Details |
|
|
|
The **AITSecNER** model was fine-tuned using the [urchade/gliner_small](https://huggingface.co/urchade/gliner_small) model from Hugging Face on the [priamai/AnnoCTR dataset](https://huggingface.co/datasets/priamai/AnnoCTR). For more details about the dataset, see the paper ["AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports"](https://arxiv.org/abs/2305.10472). |
|
|
|
GLiNER is described in detail in the paper ["GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer"](https://arxiv.org/abs/2311.08526). |
|
|
|
## About |
|
|
|
**AITSecNER** leverages GLiNER to quickly and accurately extract cybersecurity-specific entities, making it highly suitable for tasks such as: |
|
|
|
- Cyber threat intelligence analysis |
|
- Incident response documentation |
|
- Automated cybersecurity reporting |