Spaces:
Sleeping
Sleeping
File size: 4,252 Bytes
5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a e83e5dc 5984d9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# Model documentation & parameters
**Algorithm Version**: Which model version to use.
**Property goals**: One or multiple properties that will be optimized.
**Protein target**: An AAS of a protein target used for conditioning. Leave blank unless you use `affinity` as a `property goal`.
**Decoding temperature**: The temperature parameter in the SMILES/SELFIES decoder. Higher values lead to more explorative choices, smaller values culminate in mode collapse.
**Maximal sequence length**: The maximal number of SMILES tokens in the generated molecule.
**Number of samples**: How many samples should be generated (between 1 and 50).
**Limit**: Hypercube limits in the latent space.
**Number of steps**: Number of steps for a GP optmization round. The longer the slower. Has to be at least `Number of initial points`.
**Number of initial points**: Number of initial points evaluated. The longer the slower.
**Number of optimization rounds**: Maximum number of optimization rounds.
**Sampling variance**: Variance of the Gaussian noise applied during sampling from the optimal point.
**Samples for evaluation**: Number of samples averaged for each minimization function evaluation.
**Max. sampling steps**: Maximum number of sampling steps in an optmization round.
**Seed**: The random seed used for initialization.
# Model card -- PaccMannGP
**Model Details**: [PaccMann<sup>GP</sup>](https://github.com/PaccMann/paccmann_gp) is a language-based Variational Autoencoder that is coupled with a GaussianProcess for controlled sampling. This model systematically explores the latent space of a trained molecular VAE.
**Developers**: Jannis Born, Matteo Manica and colleagues from IBM Research.
**Distributors**: Original authors' code wrapped and distributed by GT4SD Team (2023) from IBM Research.
**Model date**: Published in 2022.
**Model version**: A molecular VAE trained on 1.5M molecules from ChEMBL.
**Model type**: A language-based molecular generative model that can be explored with Gaussian Processes to generate molecules with desired properties.
**Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**:
Described in the [original paper](https://pubs.acs.org/doi/10.1021/acs.jcim.1c00889).
**Paper or other resource for more information**:
[Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model (2022; *Journal of Chemical Information & Modeling*)](https://pubs.acs.org/doi/10.1021/acs.jcim.1c00889).
**License**: MIT
**Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).
**Intended Use. Use cases that were envisioned during development**: Chemical research, in particular drug discovery.
**Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes.
**Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.
**Factors**: Not applicable.
**Metrics**: High reward on generating molecules with desired properties.
**Datasets**: ChEMBL.
**Ethical Considerations**: Unclear, please consult with original authors in case of questions.
**Caveats and Recommendations**: Unclear, please consult with original authors in case of questions.
Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)
## Citation
```bib
@article{born2022active,
author = {Born, Jannis and Huynh, Tien and Stroobants, Astrid and Cornell, Wendy D. and Manica, Matteo},
title = {Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model},
journal = {Journal of Chemical Information and Modeling},
volume = {62},
number = {2},
pages = {240-257},
year = {2022},
doi = {10.1021/acs.jcim.1c00889},
note ={PMID: 34905358},
URL = {https://doi.org/10.1021/acs.jcim.1c00889}
}
``` |