# Finetuned DistilBERT This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-uncased). It was introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation). This model is uncased: it does not make a difference between english and English. ## Model description DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This model is further finetuned on the DB_PEDIA Dataset which can be found [here](https://huggingface.co/datasets/DeveloperOats/DBPedia_Classes). This dataset consists of 342,782 Wikipedia articles that have been cleaned and classified into hierarchical classes. The classification system spans three levels, with 9 classes at the first level, 70 classes at the second level, and 219 classes at the third level. ## Intended uses & limitations You can use the model to extract structured content and organizing it into taxonomic categories. The model outputs the classification according to the label number which can be mapped by the following lines in the code snippet: labelint = ['LABEL_0', 'LABEL_1', 'LABEL_2', 'LABEL_3', 'LABEL_4', 'LABEL_5', 'LABEL_6', 'LABEL_7', 'LABEL_8'] labeltxt = np.loadtxt("TASK2/label_vals/l1.txt", dtype="str") (where labeltxt is : Agent, Device, Event, Place, Species, SportsSeason, TopicalConcept, UnitOfWork, Work) ### How to use You can use this model directly with a pipeline: ```python from transformers import pipeline import numpy as np text = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three." classifier = pipeline("text-classification", model="carbonnnnn/T2L1DISTILBERT") labeltxt = np.loadtxt("TASK2/label_vals/l1.txt", dtype="str") labelint = ['LABEL_0', 'LABEL_1', 'LABEL_2', 'LABEL_3', 'LABEL_4', 'LABEL_5', 'LABEL_6', 'LABEL_7', 'LABEL_8'] output = classifier(text)[0]['label'] for i in range(len(labelint)): if output == labelint[i]: print("Output is : " + str(labeltxt[i])) ``` ### Limitations and bias Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. It also inherits some of [the bias of its teacher model](https://huggingface.co/bert-base-uncased#limitations-and-bias). ## Evaluation results