Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.6.0
Model documentation & parameters
Algorithm Version: Which model version to use.
Maximal sequence length: The maximal number of SMILES tokens in the generated molecule.
Number of samples: How many samples should be generated (between 1 and 50).
Model card -- PolymerBlocks
Model Details: PolymerBlocks is a sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers). The model relies on a Variational Autoencoder architecture as described in Born et al. (2021; iScience).
Developers: Matteo Manica and colleagues from IBM Research.
Distributors: Original authors' code integrated into GT4SD.
Model date: Not yet published.
Model version: Only initial model version. The model has been pre-trained on 500K compounds from PubChem and further fine-tuned on the SMILES representing monomers and catalysts collected in the database presented in Park et al. (2022).
Model type: A sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers).
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: the sequence-based model is a standard GRU-based VAE trained to reconstruct SMILES representation of molecules. Given the nature of the pre-training and fine-tuning data, the model is biased to create molecules that resemble catalysts and monomers employed in ring-opening polymerization.
Paper or other resource for more information: Details on the model used and code can be found in Born et al. (2021; iScience).
License: MIT
Where to send questions or comments about the model: Open an issue on GT4SD repository.
Intended Use. Use cases that were envisioned during development: Chemical research, in particular discovery and catalysts for polymerization.
Primary intended uses/users: Researchers and computational chemists using the model for model comparison or research exploration purposes.
Out-of-scope use cases: Production-level inference, producing molecules with harmful properties.
Metrics: N.A.
Datasets: See description in the model versions.
Ethical Considerations: Unclear, please consult with original authors in case of questions.
Caveats and Recommendations: Unclear, please consult with original authors in case of questions.
Model card prototype inspired by Mitchell et al. (2019)
Citation
@article{manica2022gt4sd,
title={GT4SD: Generative Toolkit for Scientific Discovery},
author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
journal={arXiv preprint arXiv:2207.03928},
year={2022}
}