pranav-s
/

PolymerNER

Feature Extraction

token-classification

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

PolymerNER / README.md

pranav-s's picture

Upload README.md

4f7bb8b over 1 year ago

|

history blame contribute delete

2.99 kB

	---
	language: en
	tags:
	- transformers
	- feature-extraction
	- materials
	license: other
	---

	# PolymerNER

	This model is a fine-tuned version of the MaterialsBERT model on a dataset of 638 abstracts and contains a linear layer on top of MaterialsBERT to predict the entity type of each token. The entity types predicted by this model are POLYMER, POLYMER\_FAMILY, ORGANIC, INORGANIC, MONOMER, PROP\_NAME, PROP\_VALUE, MATERIAL\_AMOUNT.
	This named entity recognition (NER) model was introduced in [this](https://www.nature.com/articles/s41524-023-01003-w) paper. Refer to the paper for a more detailed description of the entity types and performance metrics of the model. As MaterialsBERT is uncased, the NER model is also uncased.

	## Intended uses & limitations

	You can use the model for sequence labeling/entity tagging tasks on materials science text. The training, validation and test data for the model consisted of abstracts related to polymers. The entities tagged by the model however are general and can be used with any materials science text to tag the entity types defined in the ontology of the model.

	## How to Use

	Here is how to use the model to tag entities given some text:

	```python
	from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
	tokenizer = AutoTokenizer.from_pretrained('pranav-s/PolymerNER', model_max_length=512)
	model = AutoModelForTokenClassification.from_pretrained('pranav-s/PolymerNER')
	ner_pipeline = pipeline(task="ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple", device='cpu')
	text = "Polyethylene has a glass transition temperature of -100 °C"
	ner_output = ner_pipeline(text)
	```

	## Training data

	A training data set of 638 polymer abstracts was used. The data set is provided [here](https://github.com/Ramprasad-Group/polymer_information_extraction)

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train\_batch_size: 8
	- eval\_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr\_scheduler_type: linear
	- num_epochs: 5


	### Framework versions

	- Transformers 4.17.0
	- Pytorch 1.10.2
	- Datasets 1.18.3
	- Tokenizers 0.11.0


	## Citation

	If you find PolymerNER useful in your research, please cite the following paper:

	```latex
	@article{materialsbert,
	title={A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing},
	author={Shetty, Pranav and Rajan, Arunkumar Chitteth and Kuenneth, Chris and Gupta, Sonakshi and Panchumarti, Lakshmi Prerana and Holm, Lauren and Zhang, Chao and Ramprasad, Rampi},
	journal={npj Computational Materials},
	volume={9},
	number={1},
	pages={52},
	year={2023},
	publisher={Nature Publishing Group UK London}
	}
	```

	<a href="https://huggingface.co/exbert/?model=pranav-s/PolymerNER">
	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
	</a>