Upload README.md

Browse files

Files changed (1) hide show

README.md +116 -3

README.md CHANGED Viewed

@@ -1,3 +1,116 @@
----
-license: apache-2.0
----

+---
+language:
+- en
+base_model:
+- CrabInHoney/urlbert-tiny-base-v3
+pipeline_tag: text-classification
+tags:
+- url
+- cybersecurity
+- urls
+- links
+- classification
+- phishing-detection
+- tiny
+- phishing
+- malware
+- defacement
+- transformers
+- urlbert
+- bert
+- malicious
+license: apache-2.0
+---
+# URLBERT-Tiny-v3 Malicious URL Classifier
+This is a lightweight version of BERT, specifically fine-tuned for classifying URLs into four categories: benign, phishing, malware, and defacement.
+## Model Details
+- **Model size**: 3.69M parameters
+- **Tensor type**: F32
+- **Model weight size**: 14.8 MB
+- **Base model**: [CrabInHoney/urlbert-tiny-base-v3](https://huggingface.co/CrabInHoney/urlbert-tiny-base-v3)
+- **Dataset**: [Malicious URLs Dataset](https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset)
+## Model Evaluation Results
+The model was evaluated on a test set with the following classification metrics:
+| Class        | Precision  | Recall     | F1-Score   |
+|--------------|------------|------------|------------|
+| Benign       | 0.987695   | 0.993717   | 0.990697   |
+| Defacement   | 0.988510   | 0.998963   | 0.993709   |
+| Malware      | 0.988291   | 0.960332   | 0.974111   |
+| Phishing     | 0.958425   | 0.930826   | 0.944423   |
+| **Accuracy** | 0.983738   | 0.983738   | 0.983738   |
+| **Macro Avg**| 0.980730   | 0.970959   | 0.975735   |
+| **Weighted Avg** | 0.983615 | 0.983738   | 0.983627   |
+## Usage Example
+Below is an example of how to use the model for URL classification using the Hugging Face `transformers` library:
+```python
+from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
+import torch
+# Определение устройства (GPU или CPU)
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+print(f"Используемое устройство: {device}")
+# Загрузка модели и токенизатора
+model_name = "CrabInHoney/urlbert-tiny-v3-malicious-url-classifier"
+tokenizer = BertTokenizerFast.from_pretrained(model_name)
+model = BertForSequenceClassification.from_pretrained(model_name)
+model.to(device)
+# Создание pipeline для классификации
+classifier = pipeline(
+    "text-classification",
+    model=model,
+    tokenizer=tokenizer,
+    device=0 if torch.cuda.is_available() else -1,
+    return_all_scores=True
+)
+# Примеры URL для тестирования
+test_urls = [
+    "wikiobits.com/Obits/TonyProudfoot",
+    "http://www.824555.com/app/member/SportOption.php?uid=guest&langx=gb",
+]
+# Маппинг меток на понятные названия классов
+label_mapping = {
+    "LABEL_0": "benign",
+    "LABEL_1": "defacement",
+    "LABEL_2": "malware",
+    "LABEL_3": "phishing"
+}
+# Классификация URL
+for url in test_urls:
+    results = classifier(url)
+    print(f"\nURL: {url}")
+    for result in results[0]:
+        label = result['label']
+        score = result['score']
+        friendly_label = label_mapping.get(label, label)
+        print(f"Класс: {friendly_label}, вероятность: {score:.4f}")
+```
+### Example Output:
+```
+URL: wikiobits.com/Obits/TonyProudfoot
+Класс: benign, вероятность: 0.9953
+Класс: defacement, вероятность: 0.0000
+Класс: malware, вероятность: 0.0000
+Класс: phishing, вероятность: 0.0046
+URL: http://www.824555.com/app/member/SportOption.php?uid=guest&langx=gb
+Класс: benign, вероятность: 0.0000
+Класс: defacement, вероятность: 0.0001
+Класс: malware, вероятность: 0.9998
+Класс: phishing, вероятность: 0.0001
+```