metadata

language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
base_model:
  - cross-encoder/ms-marco-MiniLM-L-4-v2
pipeline_tag: text-classification
library_name: transformers
metrics:
  - f1
  - precision
  - recall
datasets:
  - Mozilla/autofill_dataset

Cross-Encoder for MS Marco with TinyBert

This is a fine-tuned version of the model checkpointed at cross-encoder/ms-marco-MiniLM-L-4-v2.

It was fine-tuned on html tags and labels generated using Fathom.

How to use this model in `transformers`

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Mozilla/tinybert-uncased-autofill"
)

print(
    classifier('Card information input Card number cc-number <SEP> <SEP> input First name <SEP> <SEP>')
)

Model Training Info

HyperParameters = {
    'learning_rate': 2.3878733582558547e-05,
    'num_train_epochs': 21,
    'weight_decay': 0.0005288040458920454,
    'per_device_train_batch_size': 32
}

More information on how the model was trained can be found here: https://github.com/mozilla/smart_autofill

Model Performance

Test Performance:
Precision: 0.913
Recall: 0.872
F1: 0.887

             precision    recall  f1-score   support

      cc-csc      0.943     0.950     0.946       139
      cc-exp      1.000     0.883     0.938        60
cc-exp-month      0.954     0.922     0.938        90
 cc-exp-year      0.904     0.934     0.919        91
     cc-name      0.835     0.989     0.905        92
   cc-number      0.953     0.970     0.961       167
     cc-type      0.920     0.940     0.930       183
       email      0.918     0.927     0.922       205
  given-name      0.727     0.421     0.533        19
   last-name      0.833     0.588     0.690        17
       other      0.994     0.994     0.994      8000
 postal-code      0.980     0.951     0.965       102

    accuracy                          0.985      9165
   macro avg      0.913     0.872     0.887      9165
weighted avg      0.986     0.985     0.985      9165

Cross-Encoder for MS Marco with TinyBert

How to use this model in transformers

Model Training Info

Model Performance

How to use this model in `transformers`