CrabInHoney commited on
Commit
8567c54
·
verified ·
1 Parent(s): babca08

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -3
README.md CHANGED
@@ -1,3 +1,76 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is a very small version of BERT, designed to categorize links into phishing and non-phishing links
2
+
3
+ Val score: 0.9622
4
+
5
+ Model size
6
+
7
+ 3.7M params
8
+
9
+ Tensor type
10
+
11
+ F32
12
+
13
+ [Dataset](https://huggingface.co/datasets/ealvaradob/phishing-dataset "Dataset")
14
+ (urls.json only)
15
+
16
+ Example:
17
+
18
+ from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
19
+ import torch
20
+
21
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
22
+ print(f"Используемое устройство: {device}")
23
+
24
+ model_path = "./urlbert-tiny-v2-phishing-classifier"
25
+
26
+ tokenizer = BertTokenizerFast.from_pretrained(model_path)
27
+
28
+ model = BertForSequenceClassification.from_pretrained(model_path)
29
+ model.to(device)
30
+
31
+ classifier = pipeline(
32
+ "text-classification",
33
+ model=model,
34
+ tokenizer=tokenizer,
35
+ device=0 if torch.cuda.is_available() else -1,
36
+ return_all_scores=True
37
+ )
38
+
39
+ test_urls = [
40
+ "huggingface.co/",
41
+ "p64.hu991ngface.co.com.ru/"
42
+ ]
43
+
44
+ for url in test_urls:
45
+ results = classifier(url)
46
+ print(f"\nURL: {url}")
47
+ for result in results[0]:
48
+ label = result['label']
49
+ score = result['score']
50
+ print(f"Класс: {label}, вероятность: {score:.4f}")
51
+
52
+
53
+ Output:
54
+
55
+ Используемое устройство: cuda
56
+
57
+ URL: huggingface.co/
58
+
59
+ Класс: good, вероятность: 0.8515
60
+
61
+ Класс: phish, вероятность: 0.1485
62
+
63
+
64
+
65
+ URL: p64.hu991ngface.co.com.ru/
66
+
67
+ Класс: good, вероятность: 0.0289
68
+
69
+ Класс: phish, вероятность: 0.9711
70
+
71
+
72
+
73
+
74
+ ## License
75
+
76
+ [MIT](https://choosealicense.com/licenses/mit/)