Papers
arxiv:1912.09582

BERTje: A Dutch BERT Model

Published on Dec 19, 2019
Authors:
,
,
,
,
,

Abstract

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch <PRE_TAG>BERT model</POST_TAG> called <PRE_TAG>BERTje</POST_TAG>. Compared to the multilingual BERT model, which includes Dutch but is only based on Wikipedia text, <PRE_TAG>BERTje</POST_TAG> is based on a large and diverse dataset of 2.4 billion tokens. <PRE_TAG>BERTje</POST_TAG> consistently outperforms the equally-sized multilingual <PRE_TAG>BERT model</POST_TAG> on downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic role labeling, and sentiment analysis). Our pre-trained Dutch BERT model is made available at https://github.com/wietsedv/bertje.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1912.09582 in a dataset README.md to link it from this page.

Spaces citing this paper 5

Collections including this paper 1