Continue pre-training RoBERTa-base using discharge summaries from MIMIC-III datasets.
Details can be found in the following paper

Xiang Dai and Ilias Chalkidis and Sune Darkner and Desmond Elliott. 2022. Revisiting Transformer-based Models for Long Document Classification. (https://arxiv.org/abs/2204.06683)

Important hyper-parameters


Max sequence	4096
Batch size	8
Learning rate	5e-5
Training epochs	6
Training time	130 GPU-hours

Downloads last month: 128

Inference Examples

Fill-Mask

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.