tinyroberta-6l-768d / README.md
julianrisch's picture
Update README.md
eae7258 verified
metadata
language: en
datasets:
  - squad_v2
license: cc-by-4.0

roberta-base distilled into tinyroberta

Overview

Language model: roberta-base
Language: English
Training data: The PILE
Infrastructure: 4x Tesla v100

Hyperparameters

batch_size = 96
n_epochs = 4
max_seq_len = 384
learning_rate = 1e-4
lr_schedule = LinearWarmup
warmup_proportion = 0.2
teacher = "deepset/roberta-base"

Distillation

This model was distilled using the TinyBERT approach described in this paper and implemented in haystack. We have performed intermediate layer distillation with roberta-base as the teacher which resulted in deepset/tinyroberta-6l-768d. This model has not been distilled for any specific task. If you are interested in using distillation to improve its performance on a downstream task, you can take advantage of haystack's new distillation functionality. You can also check out deepset/tinyroberta-squad2 for a model that is already distilled on an extractive QA downstream task.

About us

deepset is the company behind the production-ready open-source AI framework Haystack.

Some of our other work:

Get in touch and join the Haystack community

For more info on Haystack, visit our GitHub repo and Documentation.

We also have a Discord community open to everyone!

Twitter | LinkedIn | Discord | GitHub Discussions | Website | YouTube

By the way: we're hiring!