SMILES_BERT

A BERT model trained on a list of 50,000 SMILES for MLM

Example:

Acetaminophen

CC(=O)NC1=CC=C(C=C1)O

Model description

This model is a BERT model that was trained on a list of 50k SMILES. The SMILES were sourced from BindingDB and the compounds bind to certain proteins with some affinity. The purpose of this model was to provide a model that understands SMILES which can then be fine-tuned for other tasks in which SMILES data can be useful e.g. bind affinity prediction, classification, etc.

Intended uses & limitations

This model was trained in order to provide a model which can then be fine-tuned for other tasks in which SMILES data can be useful such as predicting physical properties, chemical activity, or biological activity.

Training results

Training Loss: 0.9446000

Further evaluation is needed

Framework versions

  • Transformers 4.37.0.dev0
  • Pytorch 2.1.0+cu121
  • Tokenizers 0.15.0
Downloads last month
639
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.