Spaces:
Runtime error
Runtime error
<!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# DistilBERT | |
<div class="flex flex-wrap space-x-1"> | |
<a href="https://huggingface.co/models?filter=distilbert"> | |
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-distilbert-blueviolet"> | |
</a> | |
<a href="https://huggingface.co/spaces/docs-demos/distilbert-base-uncased"> | |
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"> | |
</a> | |
</div> | |
## Overview | |
The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, a | |
distilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, a | |
distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a | |
small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than | |
*bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language | |
understanding benchmark. | |
The abstract from the paper is the following: | |
*As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), | |
operating these large models in on-the-edge and/or under constrained computational training or inference budgets | |
remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation | |
model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger | |
counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage | |
knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by | |
40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive | |
biases learned by larger models during pretraining, we introduce a triple loss combining language modeling, | |
distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we | |
demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device | |
study.* | |
Tips: | |
- DistilBERT doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just | |
separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`). | |
- DistilBERT doesn't have options to select the input positions (`position_ids` input). This could be added if | |
necessary though, just let us know if you need this option. | |
- Same as BERT but smaller. Trained by distillation of the pretrained BERT model, meaning itβs been trained to predict the same probabilities as the larger model. The actual objective is a combination of: | |
* finding the same probabilities as the teacher model | |
* predicting the masked tokens correctly (but no next-sentence objective) | |
* a cosine similarity between the hidden states of the student and the teacher model | |
This model was contributed by [victorsanh](https://huggingface.co/victorsanh). This model jax version was | |
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation). | |
## Resources | |
A list of official Hugging Face and community (indicated by π) resources to help you get started with DistilBERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. | |
<PipelineTag pipeline="text-classification"/> | |
- A blog post on [Getting Started with Sentiment Analysis using Python](https://huggingface.co/blog/sentiment-analysis-python) with DistilBERT. | |
- A blog post on how to [train DistilBERT with Blurr for sequence classification](https://huggingface.co/blog/fastai). | |
- A blog post on how to use [Ray to tune DistilBERT hyperparameters](https://huggingface.co/blog/ray-tune). | |
- A blog post on how to [train DistilBERT with Hugging Face and Amazon SageMaker](https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face). | |
- A notebook on how to [finetune DistilBERT for multi-label classification](https://colab.research.google.com/github/DhavalTaunk08/Transformers_scripts/blob/master/Transformers_multilabel_distilbert.ipynb). π | |
- A notebook on how to [finetune DistilBERT for multiclass classification with PyTorch](https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_multiclass_classification.ipynb). π | |
- A notebook on how to [finetune DistilBERT for text classification in TensorFlow](https://colab.research.google.com/github/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb). π | |
- [`DistilBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb). | |
- [`TFDistilBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb). | |
- [`FlaxDistilBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb). | |
- [Text classification task guide](../tasks/sequence_classification) | |
<PipelineTag pipeline="token-classification"/> | |
- [`DistilBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb). | |
- [`TFDistilBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb). | |
- [`FlaxDistilBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification). | |
- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Token classification task guide](../tasks/token_classification) | |
<PipelineTag pipeline="fill-mask"/> | |
- [`DistilBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb). | |
- [`TFDistilBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb). | |
- [`FlaxDistilBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb). | |
- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Masked language modeling task guide](../tasks/masked_language_modeling) | |
<PipelineTag pipeline="question-answering"/> | |
- [`DistilBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb). | |
- [`TFDistilBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb). | |
- [`FlaxDistilBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/question-answering). | |
- [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the π€ Hugging Face Course. | |
- [Question answering task guide](../tasks/question_answering) | |
**Multiple choice** | |
- [`DistilBertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb). | |
- [`TFDistilBertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb). | |
- [Multiple choice task guide](../tasks/multiple_choice) | |
βοΈ Optimization | |
- A blog post on how to [quantize DistilBERT with π€ Optimum and Intel](https://huggingface.co/blog/intel). | |
- A blog post on how [Optimizing Transformers for GPUs with π€ Optimum](https://www.philschmid.de/optimizing-transformers-with-optimum-gpu). | |
- A blog post on [Optimizing Transformers with Hugging Face Optimum](https://www.philschmid.de/optimizing-transformers-with-optimum). | |
β‘οΈ Inference | |
- A blog post on how to [Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia](https://huggingface.co/blog/bert-inferentia-sagemaker) with DistilBERT. | |
- A blog post on [Serverless Inference with Hugging Face's Transformers, DistilBERT and Amazon SageMaker](https://www.philschmid.de/sagemaker-serverless-huggingface-distilbert). | |
π Deploy | |
- A blog post on how to [deploy DistilBERT on Google Cloud](https://huggingface.co/blog/how-to-deploy-a-pipeline-to-google-clouds). | |
- A blog post on how to [deploy DistilBERT with Amazon SageMaker](https://huggingface.co/blog/deploy-hugging-face-models-easily-with-amazon-sagemaker). | |
- A blog post on how to [Deploy BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module](https://www.philschmid.de/terraform-huggingface-amazon-sagemaker). | |
## DistilBertConfig | |
[[autodoc]] DistilBertConfig | |
## DistilBertTokenizer | |
[[autodoc]] DistilBertTokenizer | |
## DistilBertTokenizerFast | |
[[autodoc]] DistilBertTokenizerFast | |
## DistilBertModel | |
[[autodoc]] DistilBertModel | |
- forward | |
## DistilBertForMaskedLM | |
[[autodoc]] DistilBertForMaskedLM | |
- forward | |
## DistilBertForSequenceClassification | |
[[autodoc]] DistilBertForSequenceClassification | |
- forward | |
## DistilBertForMultipleChoice | |
[[autodoc]] DistilBertForMultipleChoice | |
- forward | |
## DistilBertForTokenClassification | |
[[autodoc]] DistilBertForTokenClassification | |
- forward | |
## DistilBertForQuestionAnswering | |
[[autodoc]] DistilBertForQuestionAnswering | |
- forward | |
## TFDistilBertModel | |
[[autodoc]] TFDistilBertModel | |
- call | |
## TFDistilBertForMaskedLM | |
[[autodoc]] TFDistilBertForMaskedLM | |
- call | |
## TFDistilBertForSequenceClassification | |
[[autodoc]] TFDistilBertForSequenceClassification | |
- call | |
## TFDistilBertForMultipleChoice | |
[[autodoc]] TFDistilBertForMultipleChoice | |
- call | |
## TFDistilBertForTokenClassification | |
[[autodoc]] TFDistilBertForTokenClassification | |
- call | |
## TFDistilBertForQuestionAnswering | |
[[autodoc]] TFDistilBertForQuestionAnswering | |
- call | |
## FlaxDistilBertModel | |
[[autodoc]] FlaxDistilBertModel | |
- __call__ | |
## FlaxDistilBertForMaskedLM | |
[[autodoc]] FlaxDistilBertForMaskedLM | |
- __call__ | |
## FlaxDistilBertForSequenceClassification | |
[[autodoc]] FlaxDistilBertForSequenceClassification | |
- __call__ | |
## FlaxDistilBertForMultipleChoice | |
[[autodoc]] FlaxDistilBertForMultipleChoice | |
- __call__ | |
## FlaxDistilBertForTokenClassification | |
[[autodoc]] FlaxDistilBertForTokenClassification | |
- __call__ | |
## FlaxDistilBertForQuestionAnswering | |
[[autodoc]] FlaxDistilBertForQuestionAnswering | |
- __call__ | |