File size: 1,658 Bytes
600e441
 
 
 
 
 
 
 
 
 
 
f21d5dc
600e441
 
 
 
03c1abf
b08e59b
0963214
600e441
 
 
 
 
 
 
 
 
3a1447d
600e441
 
 
795e422
600e441
 
 
795e422
600e441
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
license: apache-2.0
datasets:
- shawhin/phishing-site-classification
metrics:
- accuracy
- recall
- precision
- f1
base_model: distilbert/distilbert-base-uncased
pipeline_tag: text-classification
library_name: transformers
---

# bert-phishing-classifier_student

This model is modified version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) trained via knowledge distillation from [shawhin/bert-phishing-classifier_teacher](https://huggingface.co/shawhin/bert-phishing-classifier_teacher) using the [shawhin/phishing-site-classification](https://huggingface.co/datasets/shawhin/phishing-site-classification) dataset. 
It achieves the following results on the testing set:
- Loss (training): 0.0563
- Accuracy: 0.9022
- Precision: 0.9426
- Recall: 0.8603
- F1 Score: 0.8995

## Model description

Student model for knowledge distillation example.

[Video](https://youtu.be/FLkUOkeMd5M) | [Blog](https://towardsdatascience.com/compressing-large-language-models-llms-9f406eea5b5e) | [Example code](https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/model-compression)

## Intended uses & limitations

This model was created for educational purposes.

## Training and evaluation data

The Training, Testing, and Validation data are available here: [shawhin/phishing-site-classification](https://huggingface.co/datasets/shawhin/phishing-site-classification).

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- num_epochs: 5
- temperature: 2.0
- adam optimizer alpha: 0.5