---
language: en
library_name: LogClassifier
tags:
- log-classification
- log feature
- log-similarity
- transformers
- AIOps
pipeline_tag: text-classification
---


# s2-log-classifier-BERT-v1
This model is a transformers classification model, trained using BERTForSequenceClassification designed for use in network and device log mining tasks. 
Developed by [Selector AI](https://www.selector.ai/) 

## Model Usage
```python
from transformers import BertForSequenceClassification, BertTokenizer

# Step 1: Load the model and tokenizer from Hugging Face
model = BertForSequenceClassification.from_pretrained("SelectorAI/s2-log-classifier-BERT-v1")
tokenizer = BertTokenizer.from_pretrained("SelectorAI/s2-log-classifier-BERT-v1")

import torch

model.eval()

# Step 2: Prepare the input data (Example log text)
log_text = "Error occurred while accessing the database."

# Tokenize the input data
inputs = tokenizer(log_text, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Step 3: Make predictions
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Step 4: Get the predicted class (the class with the highest score)
predicted_class = torch.argmax(logits, dim=1).item()

# label mapping (can load from JSON file in repo or config)
label_mapping = model.config.id2label

# Step 5: Get the event name
predicted_event = label_mapping[predicted_class]
print(f"Predicted Event: {predicted_event}")
```

## Background

The model focuses on structured and semi-structured log data, outputing around 60 different event categories. It is highly effective 
for real-time log analysis, anomaly detection, and operational monitoring, helping organizations manage 
large-scale network data by automatically classifying logs into predefined categories, facilitating faster 
and more accurate diagnosis of network issues.

## Intended uses

Our model is intended to be used as classifier. Given an input text (a log coming from a network/device/router), it outputs a corresponding event most associated with the log.
The possible events that can be classified are shown in [encoder-main.json](https://huggingface.co/rahulm-selector/log-classifier-BERT-v1/blob/main/encoder-main.json)


## Training Details

### Data

The model was trained on a variety of network events and system logs, focusing on monitoring and analyzing state changes,
protocol behaviors, and hardware interactions across infrastructure components. This included tracking routing issues, 
protocol neighbor state changes, link stability, and security events, ensuring that the model could recognize and 
classify critical patterns in device communications, network health, and configuration activities.

### Train/Test Split

- **Train Data Size**: `~80K Logs`
- **Test Data Size**: `~20K Logs` 

#### Hyper Parameters

The following hyperparameters were used during training to optimize the model's performance:

- **Batch Size**: `32`
- **Learning Rate**: `.001` 
- **Optimizer**: `Adam`
- **Epochs**: `10`
- **Dropout Rate**: N/A
- **LSTM Hidden Dimension**: `384`
- **Embedding Dimension**: `384`

## Credits

This project was developed by a collaborative team at [Selector AI](https://www.selector.ai/). Below are the key contributors:

### Authors
- **Rahul Muthuswamy**  
  Role: Project Lead and Model Developer
  Email: [rahulm@selector.ai]  

- **Alex Lau**  
  Role: Mentor
  Email: [alexlau@selector.ai]  

- **Sebastian Reyes**  
  Role: Mentor
  Email: [seb@selector.ai]  

- **Surya Nimmagadda**  
  Role: Mentor
  Email: [nscsekhar@selector.ai]