Continue pre-training RoBERTa-base using discharge summaries from MIMIC-III datasets.
Details can be found in the following paper
Xiang Dai and Ilias Chalkidis and Sune Darkner and Desmond Elliott. 2022. Revisiting Transformer-based Models for Long Document Classification. (https://arxiv.org/abs/2204.06683)
- Important hyper-parameters
Max sequence | 4096 |
Batch size | 8 |
Learning rate | 5e-5 |
Training epochs | 6 |
Training time | 130 GPU-hours |
- Downloads last month
- 128
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.