Patent Classification Model

Model Description

multilabel_patent_classifier is a fine-tuned XLM-RoBERTa-large model that has been trained on patent class information between 1855-1883 made available here.

It has been trained to recognize 146 classes of named entities outlined by the British Patent Office. These are made available here.

We take the original xlm-roberta-large weights and fine tune on our custom dataset for 10 epochs with a learning rate of 2e-05 and a batch size of 64.

Usage

This model can be used with HuggingFace Transformer's Pipelines API for NER:

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer

model_name = "matthewleechen/multilabel_patent_classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

pipe = pipeline(
  task="text-classification",
  model=model,
  device = 0,
  tokenizer=tokenizer,
  return_all_scores=True
)

Training Data

Our training data consists of patent titles labelled with 0-1 tags for each patent class. Labels were generated by the British Patent Office between 1855-1883 and our patent titles were extracted from the front pages of our specification texts using a patent title NER model.

Training Procedure

We use the standard multi-label classification protocols with the HuggingFace Trainer API, but replace the default BCEWithLogitsLoss with a focal loss function (α=1, γ=2) to address class imbalance. Both during evaluation and at inference, we apply a sigmoid to each logit and use a 0.5 threshold to determine positive labels for each class.

Evaluation

We compute precision, recall, and F1 for each class (with a 0.5 sigmoid threshold), as well as exact match (only if ground truth and predicted classes are identical) and any match (if any overlap between ground truth and predicted classes) percentages.

These scores are aggregated for the test set below.

Metric Type	Precision (Micro)	Recall (Micro)	F1 (Micro)	Exact Match	Any Match
Micro Average	83.4%	60.3%	70.0%	52.9%	90.8%

References

@misc{hanlon2016,
  title = {{British Patent Technology Classification Database: 1855–1882}},
  author = {Hanlon, Walker},
  year = {2016},
  url = {http://www.econ.ucla.edu/whanlon/},
  note = {Available at: \url{http://www.econ.ucla.edu/whanlon/}}
}

@misc{lin2018focallossdenseobject,
  title={Focal Loss for Dense Object Detection}, 
  author={Tsung-Yi Lin and Priya Goyal and Ross Girshick and Kaiming He and Piotr Dollár},
  year={2018},
  eprint={1708.02002},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/1708.02002}, 
}

Citation

If you use our model in your research, please cite our accompanying paper as follows:

@article{bct2025,
  title = {300 Years of British Patents},
  author = {Enrico Berkes and Matthew Lee Chen and Matteo Tranchero},
  journal = {arXiv preprint arXiv:2401.12345},
  year = {2025},
  url = {https://arxiv.org/abs/2401.12345}
}

matthewleechen
/

multilabel_patent_classifier