|
--- |
|
language: en |
|
tags: |
|
- transformers |
|
- feature-extraction |
|
- materials |
|
license: other |
|
--- |
|
|
|
# PolymerNER |
|
|
|
This model is a fine-tuned version of the MaterialsBERT model on a dataset of 638 abstracts and contains a linear layer on top of MaterialsBERT to predict the entity type of each token. The entity types predicted by this model are POLYMER, POLYMER\_FAMILY, ORGANIC, INORGANIC, MONOMER, PROP\_NAME, PROP\_VALUE, MATERIAL\_AMOUNT. |
|
This named entity recognition (NER) model was introduced in [this](https://www.nature.com/articles/s41524-023-01003-w) paper. Refer to the paper for a more detailed description of the entity types and performance metrics of the model. As MaterialsBERT is uncased, the NER model is also uncased. |
|
|
|
## Intended uses & limitations |
|
|
|
You can use the model for sequence labeling/entity tagging tasks on materials science text. The training, validation and test data for the model consisted of abstracts related to polymers. The entities tagged by the model however are general and can be used with any materials science text to tag the entity types defined in the ontology of the model. |
|
|
|
## How to Use |
|
|
|
Here is how to use the model to tag entities given some text: |
|
|
|
```python |
|
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline |
|
tokenizer = AutoTokenizer.from_pretrained('pranav-s/PolymerNER', model_max_length=512) |
|
model = AutoModelForTokenClassification.from_pretrained('pranav-s/PolymerNER') |
|
ner_pipeline = pipeline(task="ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple", device='cpu') |
|
text = "Polyethylene has a glass transition temperature of -100 °C" |
|
ner_output = ner_pipeline(text) |
|
``` |
|
|
|
## Training data |
|
|
|
A training data set of 638 polymer abstracts was used. The data set is provided [here](https://github.com/Ramprasad-Group/polymer_information_extraction) |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train\_batch_size: 8 |
|
- eval\_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr\_scheduler_type: linear |
|
- num_epochs: 5 |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.17.0 |
|
- Pytorch 1.10.2 |
|
- Datasets 1.18.3 |
|
- Tokenizers 0.11.0 |
|
|
|
|
|
## Citation |
|
|
|
If you find PolymerNER useful in your research, please cite the following paper: |
|
|
|
```latex |
|
@article{materialsbert, |
|
title={A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing}, |
|
author={Shetty, Pranav and Rajan, Arunkumar Chitteth and Kuenneth, Chris and Gupta, Sonakshi and Panchumarti, Lakshmi Prerana and Holm, Lauren and Zhang, Chao and Ramprasad, Rampi}, |
|
journal={npj Computational Materials}, |
|
volume={9}, |
|
number={1}, |
|
pages={52}, |
|
year={2023}, |
|
publisher={Nature Publishing Group UK London} |
|
} |
|
``` |
|
|
|
<a href="https://huggingface.co/exbert/?model=pranav-s/PolymerNER"> |
|
<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png"> |
|
</a> |
|
|