|
--- |
|
language: |
|
- en |
|
tags: |
|
- part-of-speech |
|
- finetuned |
|
license: cc-by-nc-3.0 |
|
--- |
|
|
|
# BERT-base-multilingual-cased finetuned for Part-of-Speech tagging |
|
|
|
This is a multilingual BERT model fine tuned for part-of-speech tagging for English. It is trained using the Penn TreeBank (Marcus et al., 1993) and achieves an F1-score of 96.69. |
|
|
|
## Usage |
|
A *transformers* pipeline can be used to run the model: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline |
|
|
|
model_name = "QCRI/bert-base-multilingual-cased-pos-english" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForTokenClassification.from_pretrained(model_name) |
|
|
|
pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer) |
|
outputs = pipeline("A test example") |
|
print(outputs) |
|
``` |
|
|
|
|
|
## Citation |
|
This model was used for all the part-of-speech tagging based results in *Analyzing Encoded Concepts in Transformer Language Models*, published at NAACL'22. If you find this model useful for your own work, please use the following citation: |
|
|
|
```bib |
|
@inproceedings{sajjad-NAACL, |
|
title={Analyzing Encoded Concepts in Transformer Language Models}, |
|
author={Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Rafae Khan and Jia Xu}, |
|
booktitle={North American Chapter of the Association of Computational Linguistics: Human Language Technologies (NAACL)}, |
|
series={NAACL~'22}, |
|
year={2022}, |
|
address={Seattle} |
|
} |
|
``` |