--- language: - en base_model: - FacebookAI/xlm-roberta-large pipeline_tag: text-classification library_name: transformers --- # Patent Classification Model ### Model Description **multilabel_patent_classifier** is a fine-tuned [XLM-RoBERTa-large](https://huggingface.co/FacebookAI/xlm-roberta-large) model that has been trained on patent class information between 1855-1883 made available [here](http://walkerhanlon.com/data_resources/british_patent_classification_database.zip). It has been trained to recognize 146 classes of named entities outlined by the British Patent Office. These are made available [here](https://huggingface.co/matthewleechen/multiclass-classifier-patents/edit/main/BPO_classes.csv). We take the original xlm-roberta-large [weights](https://huggingface.co/FacebookAI/xlm-roberta-large/blob/main/pytorch_model.bin) and fine tune on our custom dataset for 10 epochs with a learning rate of 2e-05 and a batch size of 64. ### Usage This model can be used with HuggingFace Transformer's Pipelines API for NER: ```python from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer model_name = "matthewleechen/multilabel_patent_classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) pipe = pipeline( task="text-classification", model=model, device = 0, tokenizer=tokenizer, return_all_scores=True ) ``` ### Training Data Our training data consists of patent titles labelled with 0-1 tags for each patent class. Labels were generated by the British Patent Office between 1855-1883 and our patent titles were extracted from the front pages of our specification texts using a patent title NER [model](https://huggingface.co/matthewleechen/patent_titles_ner). ### Training Procedure We use the standard multi-label classification protocols with the HuggingFace Trainer API, but replace the default `BCEWithLogitsLoss` with a [focal loss](https://arxiv.org/pdf/1708.02002) function (α=1, γ=2) to address class imbalance. Both during evaluation and at inference, we apply a sigmoid to each logit and use a 0.5 threshold to determine positive labels for each class. ### Evaluation We compute precision, recall, and F1 for each class (with a 0.5 sigmoid threshold), as well as exact match (only if ground truth and predicted classes are identical) and any match (if any overlap between ground truth and predicted classes) percentages. These scores are aggregated for the test set below.
Metric Type | Precision (Micro) | Recall (Micro) | F1 (Micro) | Exact Match | Any Match |
---|---|---|---|---|---|
Micro Average | 83.4% | 60.3% | 70.0% | 52.9% | 90.8% |