metadata
language:
- en
datasets:
- pubmed
- chemical patent
- cooking recipe
ProcBERT
ProcBERT is a pre-trained language model specifically for procedural text. It was pre-trained on a large-scale procedural corpus (PubMed articles/chemical patents/cooking recipes) containing over 12B tokens and shows great performance on downstream tasks. More details can be found in the following paper:
@article{Bai2021PretrainOA,
title={Pre-train or Annotate? Domain Adaptation with a Constrained Budget},
author={Fan Bai and Alan Ritter and Wei Xu},
journal={ArXiv},
year={2021},
volume={abs/2109.04711}
}
Usage
from transformers import *
tokenizer = AutoTokenizer.from_pretrained("fbaigt/procbert")
model = AutoModelForTokenClassification.from_pretrained("fbaigt/procbert")
More usage details can be found here.