Papers
arxiv:2205.11342

ScholarBERT: Bigger is Not Always Better

Published on May 23, 2022
Authors:
,
,
,
,
,
,
,

Abstract

Transformer-based masked language models trained on general corpora, such as BERT and Ro<PRE_TAG>BERTa</POST_TAG>, have shown impressive performance on various downstream tasks. Increasingly, researchers are "finetuning" these models to improve performance on domain-specific tasks. Here, we report a broad study in which we applied 14 transformer-based models to 11 scientific tasks in order to evaluate how downstream performance is affected by changes along various dimensions (e.g., training data, model size, pretraining time, finetuning length). In this process, we created the largest and most diverse scientific language model to date, Scholar<PRE_TAG>BERT</POST_TAG>, by training a 770M-parameter <PRE_TAG>BERT model</POST_TAG> on an 221B token scientific literature dataset spanning many disciplines. Counterintuitively, our evaluation of the 14 BERT-based models (seven versions of Scholar<PRE_TAG>BERT</POST_TAG>, five science-specific large language models from the literature, BERT-Base, and BERT-Large) reveals little difference in performance across the 11 science-focused tasks, despite major differences in model size and training data. We argue that our results establish an upper bound for the performance achievable with BERT-based architectures on tasks from the scientific domain.

Community

Sign up or log in to comment

Models citing this paper 6

Browse 6 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2205.11342 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.