Spaces:
Running
Running
jannisborn
commited on
Commit
•
45d9693
1
Parent(s):
6e90dd6
update
Browse files- app.py +3 -0
- model_cards/regression_transformer_article.md +3 -2
app.py
CHANGED
@@ -7,6 +7,7 @@ from gt4sd.algorithms.conditional_generation.regression_transformer import (
|
|
7 |
RegressionTransformer,
|
8 |
)
|
9 |
from gt4sd.algorithms.registry import ApplicationsRegistry
|
|
|
10 |
from utils import (
|
11 |
draw_grid_generate,
|
12 |
draw_grid_predict,
|
@@ -95,6 +96,8 @@ def regression_transformer(
|
|
95 |
]
|
96 |
)
|
97 |
samples = correct_samples
|
|
|
|
|
98 |
if task == "Predict":
|
99 |
return draw_grid_predict(samples[0], target, domain=algorithm.split(":")[0])
|
100 |
else:
|
|
|
7 |
RegressionTransformer,
|
8 |
)
|
9 |
from gt4sd.algorithms.registry import ApplicationsRegistry
|
10 |
+
from terminator.tokenization import PolymerGraphTokenizer
|
11 |
from utils import (
|
12 |
draw_grid_generate,
|
13 |
draw_grid_predict,
|
|
|
96 |
]
|
97 |
)
|
98 |
samples = correct_samples
|
99 |
+
# if isinstance(config.generator.tokenizer.text_tokenizer, PolymerGraphTokenizer):
|
100 |
+
# pass
|
101 |
if task == "Predict":
|
102 |
return draw_grid_predict(samples[0], target, domain=algorithm.split(":")[0])
|
103 |
else:
|
model_cards/regression_transformer_article.md
CHANGED
@@ -60,12 +60,13 @@ Optionally specifies a list of substructures that should definitely be present i
|
|
60 |
**Algorithm version**: Models trained and distributed by the original authors.
|
61 |
- **Molecules: QED**: Model trained on 1.6M molecules (SELFIES) from ChEMBL and their QED scores.
|
62 |
- **Molecules: Solubility**: QED model finetuned on the ESOL dataset from [Delaney et al (2004), *J. Chem. Inf. Comput. Sci.*](https://pubs.acs.org/doi/10.1021/ci034243x) to predict water solubility. Model trained on augmented SELFIES.
|
63 |
-
- **Molecules: USPTO**: Model trained on 2.8M [chemical reactions](https://figshare.com/articles/dataset/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873) from the US patent office. The model used SELFIES and a synthetic property (total molecular weight of all precursors).
|
64 |
-
- **Molecules: Polymer**: Model finetuned on 600 ROPs (ring-opening polymerizations) with monomer-catalyst pairs. Model used three properties: conversion (`<conv>`), PDI (`<pdi>`) and Molecular Weight (`<molwt>`). Model trained with augmented SELFIES, optimized only to generate catalysts, given a monomer and the property constraints. See the example for details.
|
65 |
- **Molecules: Cosmo_acdl**: Model finetuned on 56k molecules with two properties (*pKa_ACDL* and *pKa_COSMO*). Model used augmented SELFIES.
|
66 |
- **Molecules: Pfas**: Model finetuned on ~1k PFAS (Perfluoroalkyl and Polyfluoroalkyl Substances) molecules with 9 properties including some experimentally measured ones (biodegradability, LD50 etc) and some synthetic ones (SCScore, molecular weight). Model trained on augmented SELFIES.
|
67 |
- **Molecules: Logp_and_synthesizability**: Model trained on 2.9M molecules (SELFIES) from PubChem with **two** synthetic properties, the logP (partition coefficient) and the [SCScore by Coley et al. (2018); *J. Chem. Inf. Model.*](https://pubs.acs.org/doi/full/10.1021/acs.jcim.7b00622?casa_token=JZzOrdWlQ_QAAAAA%3A3_ynCfBJRJN7wmP2gyAR0EWXY-pNW_l-SGwSSU2SGfl5v5SxcvqhoaPNDhxq4THberPoyyYqTZELD4Ck)
|
68 |
- **Molecules: Crippen_logp**: Model trained on 2.9M molecules (SMILES) from PubChem, but *only* on logP (partition coefficient).
|
|
|
|
|
|
|
69 |
- **Proteins: Stability**: Model pretrained on 2.6M peptides from UniProt with the Boman index as property. Finetuned on the [**Stability**](https://www.science.org/doi/full/10.1126/science.aan0693) dataset from the [TAPE benchmark](https://proceedings.neurips.cc/paper/2019/hash/37f65c068b7723cd7809ee2d31d7861c-Abstract.html) which has ~65k samples.
|
70 |
|
71 |
**Model type**: A Transformer-based language model that is trained on alphanumeric sequence to simultaneously perform sequence regression or conditional sequence generation.
|
|
|
60 |
**Algorithm version**: Models trained and distributed by the original authors.
|
61 |
- **Molecules: QED**: Model trained on 1.6M molecules (SELFIES) from ChEMBL and their QED scores.
|
62 |
- **Molecules: Solubility**: QED model finetuned on the ESOL dataset from [Delaney et al (2004), *J. Chem. Inf. Comput. Sci.*](https://pubs.acs.org/doi/10.1021/ci034243x) to predict water solubility. Model trained on augmented SELFIES.
|
|
|
|
|
63 |
- **Molecules: Cosmo_acdl**: Model finetuned on 56k molecules with two properties (*pKa_ACDL* and *pKa_COSMO*). Model used augmented SELFIES.
|
64 |
- **Molecules: Pfas**: Model finetuned on ~1k PFAS (Perfluoroalkyl and Polyfluoroalkyl Substances) molecules with 9 properties including some experimentally measured ones (biodegradability, LD50 etc) and some synthetic ones (SCScore, molecular weight). Model trained on augmented SELFIES.
|
65 |
- **Molecules: Logp_and_synthesizability**: Model trained on 2.9M molecules (SELFIES) from PubChem with **two** synthetic properties, the logP (partition coefficient) and the [SCScore by Coley et al. (2018); *J. Chem. Inf. Model.*](https://pubs.acs.org/doi/full/10.1021/acs.jcim.7b00622?casa_token=JZzOrdWlQ_QAAAAA%3A3_ynCfBJRJN7wmP2gyAR0EWXY-pNW_l-SGwSSU2SGfl5v5SxcvqhoaPNDhxq4THberPoyyYqTZELD4Ck)
|
66 |
- **Molecules: Crippen_logp**: Model trained on 2.9M molecules (SMILES) from PubChem, but *only* on logP (partition coefficient).
|
67 |
+
- **Molecules: Reactions: USPTO**: Model trained on 2.8M [chemical reactions](https://figshare.com/articles/dataset/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873) from the US patent office. The model used SELFIES and a synthetic property (total molecular weight of all precursors).
|
68 |
+
- **Molecules: Polymers: ROP Catalyst**: Model finetuned on 600 ROPs (ring-opening polymerizations) with monomer-catalyst pairs. Model used three properties: conversion (`<conv>`), PDI (`<pdi>`) and Molecular Weight (`<molwt>`). Model trained with augmented SELFIES, optimized only to generate catalysts, given a monomer and the property constraints. Try the above UI example and see [Park et al., (2022, ChemRxiv)](https://chemrxiv.org/engage/chemrxiv/article-details/62b60865e84dd185e60214af) for details.
|
69 |
+
- **Molecules: Polymers: Block copolymer**: Model finetuned on ~1k block copolymers with a novel string representation developed for Polymers. Model used two properties: dispersity (`<Dispersity>`) and MnGPC (`<MnGPC>`). This is the first generative model for block copolymers. Try the above UI example and see [Park et al., (2022, ChemRxiv)](https://chemrxiv.org/engage/chemrxiv/article-details/62b60865e84dd185e60214af) for details.
|
70 |
- **Proteins: Stability**: Model pretrained on 2.6M peptides from UniProt with the Boman index as property. Finetuned on the [**Stability**](https://www.science.org/doi/full/10.1126/science.aan0693) dataset from the [TAPE benchmark](https://proceedings.neurips.cc/paper/2019/hash/37f65c068b7723cd7809ee2d31d7861c-Abstract.html) which has ~65k samples.
|
71 |
|
72 |
**Model type**: A Transformer-based language model that is trained on alphanumeric sequence to simultaneously perform sequence regression or conditional sequence generation.
|