Spaces:
Sleeping
Sleeping
# Model documentation & parameters | |
**Algorithm Version**: Which model version (either protein-target-driven or gene-expression-profile-driven) to use and which checkpoint to rely on. | |
**Inference type**: Whether the model should be conditioned on the target (default) or whether the model is used in an `Unbiased` manner. | |
**Protein target**: An AAS of a protein target used for conditioning. Only use if `Inference type` is `Conditional` and if the `Algorithm version` is a Protein model. | |
**Gene expression target**: A list of 2128 floats, representing the embedding of gene expression profile to be used for conditioning. Only use if `Inference type` is `Conditional` and if the `Algorithm version` is a Omic model. | |
**Decoding temperature**: The temperature parameter in the SMILES/SELFIES decoder. Higher values lead to more explorative choices, smaller values culminate in mode collapse. | |
**Maximal sequence length**: The maximal number of SMILES tokens in the generated molecule. | |
**Number of samples**: How many samples should be generated (between 1 and 50). | |
# Model card -- PaccMannRL | |
**Model Details**: PaccMannRL is a language model for conditional molecular design. It consists of a domain-specific encoder (for protein targets or gene expression profiles) and a generic molecular decoder. Both components are finetuned together using RL to convert the context representation into a molecule with high affinity toward the context (i.e., binding affinity to the protein or high inhibitory effect for the cell profile). | |
**Developers**: Jannis Born, Matteo Manica and colleagues from IBM Research. | |
**Distributors**: Original authors' code wrapped and distributed by GT4SD Team (2023) from IBM Research. | |
**Model date**: Published in 2021. | |
**Model version**: Models trained and distribuetd by the original authors. | |
- **Protein_v0**: Molecular decoder pretrained on 1.5M molecules from ChEMBL. Protein encoder pretrained on 404k proteins from UniProt. Encoder and decoder finetuned on 41 SARS-CoV-2-related protein targets with a binding affinity predictor trained on BindingDB. | |
- **Omic_v0**: Molecular decoder pretrained on 1.5M molecules from ChEMBL. Gene expression encoder pretrained on 12k gene expression profiles from TCGA. Encoder and decoder finetuned on a few hundred cancer cell profiles from GDSC with a IC50 predictor trained on GDSC. | |
**Model type**: A language-based molecular generative model that can be optimized with RL to generate molecules with high affinity toward a context. | |
**Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**: | |
- **Protein**: Parameters as provided on [(GitHub repo)](https://github.com/PaccMann/paccmann_sarscov2). | |
- **Omics**: Parameters as provided on [(GitHub repo)](https://github.com/PaccMann/paccmann_rl). | |
**Paper or other resource for more information**: | |
- **Protein**: [PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning (2021; *iScience*)](https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6). | |
- **Omics**: [Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2 (2021; *Machine Learning: Science and Technology*)](https://iopscience.iop.org/article/10.1088/2632-2153/abe808/meta). | |
**License**: MIT | |
**Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core). | |
**Intended Use. Use cases that were envisioned during development**: Chemical research, in particular drug discovery. | |
**Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes. | |
**Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties. | |
**Factors**: Not applicable. | |
**Metrics**: High reward on generating molecules with high affinity toward context. | |
**Datasets**: ChEMBL, UniProt, GDSC and BindingDB (see above). | |
**Ethical Considerations**: Unclear, please consult with original authors in case of questions. | |
**Caveats and Recommendations**: Unclear, please consult with original authors in case of questions. | |
Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs) | |
## Citation | |
**Omics**: | |
```bib | |
@article{born2021paccmannrl, | |
title = {PaccMann\textsuperscript{RL}: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning}, | |
journal = {iScience}, | |
volume = {24}, | |
number = {4}, | |
pages = {102269}, | |
year = {2021}, | |
issn = {2589-0042}, | |
doi = {https://doi.org/10.1016/j.isci.2021.102269}, | |
url = {https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6}, | |
author = {Born, Jannis and Manica, Matteo and Oskooei, Ali and Cadow, Joris and Markert, Greta and {Rodr{\'{i}}guez Mart{\'{i}}nez}, Mar{\'{i}}a} | |
} | |
``` | |
**Proteins**: | |
```bib | |
@article{born2021datadriven, | |
author = {Born, Jannis and Manica, Matteo and Cadow, Joris and Markert, Greta and Mill, Nil Adell and Filipavicius, Modestas and Janakarajan, Nikita and Cardinale, Antonio and Laino, Teodoro and {Rodr{\'{i}}guez Mart{\'{i}}nez}, Mar{\'{i}}a}, | |
doi = {10.1088/2632-2153/abe808}, | |
issn = {2632-2153}, | |
journal = {Machine Learning: Science and Technology}, | |
number = {2}, | |
pages = {025024}, | |
title = {{Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2}}, | |
url = {https://iopscience.iop.org/article/10.1088/2632-2153/abe808}, | |
volume = {2}, | |
year = {2021} | |
} | |
``` |