File size: 2,221 Bytes
54c0b9f
 
 
 
 
 
 
 
 
f1f1087
 
54c0b9f
6508073
d75c15f
6508073
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21ea5ad
 
93e5a9c
6508073
d75c15f
6508073
 
 
 
 
 
93e5a9c
99f7952
6508073
93e5a9c
d75c15f
6508073
 
99f7952
6508073
 
d75c15f
6508073
d75c15f
 
6508073
d75c15f
93e5a9c
6508073
 
 
 
 
 
 
 
 
99f7952
 
 
 
acd016d
 
6508073
 
 
 
 
 
f1f1087
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
datasets:
- priamai/AnnoCTR
base_model:
- urchade/gliner_small-v1
tags:
- Security
- NER
- CTI
language:
- en
---
# AITSecNER - Entity Recognition for Cybersecurity

This repository demonstrates how to use the **AITSecNER** model hosted on Hugging Face, based on the powerful GLiNER library, to extract cybersecurity-related entities from text.

## Installation

Install GLiNER via pip:

```bash
pip install gliner
```

## Usage

### Import and Load Model

Load the pretrained AITSecNER model directly from Hugging Face:

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("selfconstruct3d/AITSecNER", load_tokenizer=True)
```

### Predict Entities

Define the input text and entity labels you wish to extract:

```python
# Example input text
text = """
Upon opening Emotet maldocs, victims are greeted with fake Microsoft 365 prompt that states 
“THIS DOCUMENT IS PROTECTED,” and instructs victims on how to enable macros.
"""

# Entity labels
labels = [
    'CLICommand/CodeSnippet', 'CON', 'DATE', 'GROUP', 'LOC', 
    'MALWARE', 'ORG', 'SECTOR', 'TACTIC', 'TECHNIQUE', 'TOOL'
]

# Predict entities
entities = model.predict_entities(text, labels, threshold=0.5)

# Display results
for entity in entities:
    print(f"{entity['text']} => {entity['label']}")
```

### Sample Output

```bash
Emotet => MALWARE
Microsoft => ORG
```

## Model Details

The **AITSecNER** model was fine-tuned using the [urchade/gliner_small](https://huggingface.co/urchade/gliner_small) model from Hugging Face on the [priamai/AnnoCTR dataset](https://huggingface.co/datasets/priamai/AnnoCTR). For more details about the dataset, see the paper ["AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports"](https://arxiv.org/abs/2305.10472).

GLiNER is described in detail in the paper ["GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer"](https://arxiv.org/abs/2311.08526).

## About

**AITSecNER** leverages GLiNER to quickly and accurately extract cybersecurity-specific entities, making it highly suitable for tasks such as:

- Cyber threat intelligence analysis
- Incident response documentation
- Automated cybersecurity reporting