File size: 3,027 Bytes
6e89d21
 
 
 
 
 
 
 
 
 
 
 
 
b362c57
6e89d21
 
7f3bad4
6e89d21
99c3caf
6e89d21
 
 
 
 
 
 
 
 
 
 
 
 
 
2b5df29
6e89d21
 
 
 
 
 
 
 
 
 
 
4143a3b
 
 
 
6e89d21
 
 
 
 
 
 
 
9b5c1d7
 
 
6e89d21
 
 
9b5c1d7
 
 
 
6e89d21
9b5c1d7
 
 
 
 
 
 
 
 
 
 
 
6e89d21
9b5c1d7
 
 
7f3bad4
 
 
 
 
 
 
 
 
6e89d21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
base_model:
- google/bert_uncased_L-2_H-128_A-2
pipeline_tag: text-classification
library_name: transformers
metrics:
- f1
- precision
- recall
datasets:
- Mozilla/autofill-dataset
---

## BERT Miniatures

This is the tiny version of the 24 BERT models referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).

This checkpoint is the original TinyBert Optimized Uncased English:
[TinyBert](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2)
checkpoint.

This model was fine-tuned on html tags and labels using [Fathom](https://mozilla.github.io/fathom/commands/label.html).

## How to use TinyBert in `transformers`

```python
from transformers import pipeline

classifier = pipeline(
	"text-classification",
	model="Mozilla/tinybert-uncased-autofill"
)

print(
	classifier('<input class="cc-number" placeholder="Enter credit card number..." />')
)

```

## Model Training Info
```python
HyperParameters: {
    'learning_rate': 0.000082,
    'num_train_epochs': 59,
    'weight_decay': 0.1,
    'per_device_train_batch_size': 32,
}
```

More information on how the model was trained can be found here: https://github.com/mozilla/smart_autofill

# Model Performance
```
Test Performance:
Precision: 0.96778
Recall: 0.96696
F1: 0.9668

                     precision    recall  f1-score   support

      CC Expiration      1.000     0.750     0.857        16
CC Expiration Month      0.972     0.972     0.972        36
 CC Expiration Year      0.946     0.946     0.946        37
            CC Name      0.882     0.968     0.923        31
          CC Number      0.942     0.980     0.961        50
    CC Payment Type      0.918     0.893     0.905        75
   CC Security Code      0.950     0.927     0.938        41
            CC Type      0.917     0.786     0.846        14
   Confirm Password      0.961     0.860     0.907        57
              Email      0.909     0.959     0.933        73
         First Name      0.800     0.800     0.800         5
               Form      0.974     0.974     0.974        39
          Last Name      0.714     1.000     0.833         5
       New Password      0.913     0.979     0.945        97
              Other      0.986     0.983     0.985      1235
              Phone      1.000     0.667     0.800         3
           Zip Code      0.912     0.969     0.939        32

           accuracy                          0.967      1846
          macro avg      0.923     0.907     0.910      1846
       weighted avg      0.968     0.967     0.967      1846
```

```
@article{turc2019,
  title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
  author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1908.08962v2 },
  year={2019}
}
```