File size: 2,221 Bytes
54c0b9f f1f1087 54c0b9f 6508073 d75c15f 6508073 21ea5ad 93e5a9c 6508073 d75c15f 6508073 93e5a9c 99f7952 6508073 93e5a9c d75c15f 6508073 99f7952 6508073 d75c15f 6508073 d75c15f 6508073 d75c15f 93e5a9c 6508073 99f7952 acd016d 6508073 f1f1087 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
datasets:
- priamai/AnnoCTR
base_model:
- urchade/gliner_small-v1
tags:
- Security
- NER
- CTI
language:
- en
---
# AITSecNER - Entity Recognition for Cybersecurity
This repository demonstrates how to use the **AITSecNER** model hosted on Hugging Face, based on the powerful GLiNER library, to extract cybersecurity-related entities from text.
## Installation
Install GLiNER via pip:
```bash
pip install gliner
```
## Usage
### Import and Load Model
Load the pretrained AITSecNER model directly from Hugging Face:
```python
from gliner import GLiNER
model = GLiNER.from_pretrained("selfconstruct3d/AITSecNER", load_tokenizer=True)
```
### Predict Entities
Define the input text and entity labels you wish to extract:
```python
# Example input text
text = """
Upon opening Emotet maldocs, victims are greeted with fake Microsoft 365 prompt that states
“THIS DOCUMENT IS PROTECTED,” and instructs victims on how to enable macros.
"""
# Entity labels
labels = [
'CLICommand/CodeSnippet', 'CON', 'DATE', 'GROUP', 'LOC',
'MALWARE', 'ORG', 'SECTOR', 'TACTIC', 'TECHNIQUE', 'TOOL'
]
# Predict entities
entities = model.predict_entities(text, labels, threshold=0.5)
# Display results
for entity in entities:
print(f"{entity['text']} => {entity['label']}")
```
### Sample Output
```bash
Emotet => MALWARE
Microsoft => ORG
```
## Model Details
The **AITSecNER** model was fine-tuned using the [urchade/gliner_small](https://huggingface.co/urchade/gliner_small) model from Hugging Face on the [priamai/AnnoCTR dataset](https://huggingface.co/datasets/priamai/AnnoCTR). For more details about the dataset, see the paper ["AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports"](https://arxiv.org/abs/2305.10472).
GLiNER is described in detail in the paper ["GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer"](https://arxiv.org/abs/2311.08526).
## About
**AITSecNER** leverages GLiNER to quickly and accurately extract cybersecurity-specific entities, making it highly suitable for tasks such as:
- Cyber threat intelligence analysis
- Incident response documentation
- Automated cybersecurity reporting |