fbaigt
/

procbert

Feature Extraction

Model card Files Files and versions Community

procbert / README.md

fbaigt's picture

Update model card

20814e1 over 3 years ago

|

history blame contribute delete

1.17 kB

	---
	language:
	- en
	datasets:
	- pubmed
	- chemical patent
	- cooking recipe
	---

	## ProcBERT
	ProcBERT is a pre-trained language model specifically for procedural text. It was pre-trained on a large-scale procedural corpus (PubMed articles/chemical patents/cooking recipes) containing over 12B tokens and shows great performance on downstream tasks. More details can be found in the following [paper](https://arxiv.org/abs/2109.04711):

	```
	@inproceedings{bai-etal-2021-pre,
	title = "Pre-train or Annotate? Domain Adaptation with a Constrained Budget",
	author = "Bai, Fan and
	Ritter, Alan and
	Xu, Wei",
	booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
	month = nov,
	year = "2021",
	address = "Online and Punta Cana, Dominican Republic",
	publisher = "Association for Computational Linguistics",
	}
	```

	## Usage
	```
	from transformers import *
	tokenizer = AutoTokenizer.from_pretrained("fbaigt/procbert")
	model = AutoModelForTokenClassification.from_pretrained("fbaigt/procbert")
	```

	More usage details can be found [here](https://github.com/bflashcp3f/ProcBERT).