--- license: apache-2.0 base_model: distilbert-base-uncased tags: - generated_from_trainer metrics: - accuracy - f1 model-index: - name: distilbert-base-uncased-logline-v3 results: [] --- # distilbert-base-uncased-logline-v3 This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the AIT Log Data Set V2.0 dataset1, https://zenodo.org/records/5789064. It achieves the following results on the evaluation set: - Loss: 0.0022 - Accuracy: 0.9995 - F1: 0.9994 ## Model description This model is meant for text classification of log files for network intrusion detection. The python package that runs this model can be found here -> https://github.com/Isaacwilliam4/INSyT. As mentioned on their site, this model was trained on the following logs: Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. ## Labels | Label | Label Name | |-------|---------------------------------------------------------------------| | 0 | attacker:dnsteal:dnsteal-dropped | | 1 | attacker:dnsteal:dnsteal-received | | 2 | attacker:dnsteal:exfiltration-service | | 3 | attacker_change_user:escalate | | 4 | attacker_change_user:escalate:escalated_command:escalated_sudo_command | | 5 | attacker_http:dirb:foothold | | 6 | attacker_http:foothold:service_scan | | 7 | attacker_http:foothold:webshell_cmd | | 8 | attacker_http:foothold:webshell_upload | | 9 | attacker_http:foothold:wpscan | | 10 | attacker_vpn:escalate | | 11 | attacker_vpn:foothold | | 12 | benign | | 13 | crack_passwords:escalate | | 14 | dirb:foothold | | 15 | dns_scan:foothold | | 16 | escalate:escalated_command:escalated_sudo_command | | 17 | escalate:escalated_command:escalated_sudo_command:escalated_sudo_session | | 18 | escalate:webshell_cmd | | 19 | foothold:network_scan | | 20 | foothold:service_scan | | 21 | foothold:traceroute | | 22 | foothold:wpscan | ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | |:-------------:|:-----:|:-----:|:---------------:|:--------:|:------:| | 0.0435 | 1.0 | 6274 | 0.0120 | 0.9965 | 0.9965 | | 0.0059 | 2.0 | 12548 | 0.0032 | 0.9993 | 0.9992 | | 0.0023 | 3.0 | 18822 | 0.0022 | 0.9995 | 0.9994 | ## Test results | Test Loss | Test Accuracy | Test F1 | |:------------|:--------------|:----------| | 0.0020 | 0.9994 | 0.9994 | ## Five Fold Cross Validation Mean Test Confusion Matrix ![Five Fold Cross Validation Mean Test Confusion Matrix](https://github.com/Isaacwilliam4/INSyT/blob/main/5_fold_cross_validation.png) ### Framework versions - Transformers 4.38.2 - Pytorch 2.0.0+cu117 - Datasets 2.18.0 - Tokenizers 0.15.1 ### Citations [1]M. Landauer, F. Skopik, M. Frank, W. Hotwagner, M. Wurzenbergerand A. Rauber, “AIT Log Data Set V2.0”. Zenodo, Feb. 24, 2022. doi: 10.5281/zenodo.5789064.