# Finetuned DistilBERT

This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-uncased). It was
introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found
[here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation). This model is uncased: it does
not make a difference between english and English.

## Model description

DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a
self-supervised fashion, using the BERT base model as a teacher. This model is further finetuned on the DB_PEDIA Dataset which can be found
[here](https://huggingface.co/datasets/DeveloperOats/DBPedia_Classes). This dataset consists of 342,782 Wikipedia articles that have been cleaned and classified into hierarchical classes. 
The classification system spans three levels, with 9 classes at the first level, 70 classes at the second level,
and 219 classes at the third level.

## Intended uses & limitations
You can use the model to extract structured content and organizing it into taxonomic categories. 

The model outputs the classification according to the label number which can be mapped by the following lines in the code snippet:

labelint = ['LABEL_0', 'LABEL_1', 'LABEL_2', 'LABEL_3', 'LABEL_4', 'LABEL_5', 'LABEL_6', 'LABEL_7', 'LABEL_8']

labeltxt = np.loadtxt("TASK2/label_vals/l1.txt", dtype="str")

(where labeltxt is : Agent, Device, Event, Place, Species, SportsSeason, TopicalConcept, UnitOfWork, Work)


### How to use

You can use this model directly with a pipeline:
```python
  from transformers import pipeline
  import numpy as np
  
  text = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."
  classifier = pipeline("text-classification", model="carbonnnnn/T2L1DISTILBERT")
  labeltxt = np.loadtxt("TASK2/label_vals/l1.txt", dtype="str")
  labelint = ['LABEL_0', 'LABEL_1', 'LABEL_2', 'LABEL_3', 'LABEL_4', 'LABEL_5', 'LABEL_6', 'LABEL_7', 'LABEL_8']
  
  output = classifier(text)[0]['label']
  
  for i in range(len(labelint)):
      if output == labelint[i]:
        print("Output is : " + str(labeltxt[i])) 

```
      

### Limitations and bias

Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
predictions. It also inherits some of
[the bias of its teacher model](https://huggingface.co/bert-base-uncased#limitations-and-bias).


## Evaluation results