README.md · fbaigt/procbert at a52d94d0e2a6331d85e95e45afe6913eca9524b9

metadata

language:
  - en
datasets:
  - pubmed
  - chemical patent
  - cooking recipe

ProcBERT

ProcBERT is a pre-trained language model specifically for procedural text. It was pre-trained on a large-scale procedural corpus (PubMed articles/chemical patents/cooking recipes) containing over 12B tokens and shows great performance on downstream tasks. More details can be found in the following paper:

@article{Bai2021PretrainOA,
  title={Pre-train or Annotate? Domain Adaptation with a Constrained Budget},
  author={Fan Bai and Alan Ritter and Wei Xu},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.04711}
}

Usage

from transformers import *
tokenizer = AutoTokenizer.from_pretrained("fbaigt/procbert")
model = AutoModelForTokenClassification.from_pretrained("fbaigt/procbert")

More usage details can be found here.