o-schilter commited on
Commit
f746564
·
1 Parent(s): 343ba2f

Updated Information

Browse files
model_cards/article.md CHANGED
@@ -2,11 +2,11 @@
2
 
3
  **Algorithm Version**: Which model version to use.
4
 
5
- **Target binding energy**: The desired binding energy.
6
 
7
- **Primer SMILES**: A SMILES string used to prime the generation.
8
 
9
- **Maximal sequence length**: The maximal number of SMILES tokens in the generated molecule.
10
 
11
  **Number of points**: Number of points to sample with the Gaussian Process.
12
 
@@ -24,31 +24,31 @@
24
 
25
  **Distributors**: Original authors' code integrated into GT4SD.
26
 
27
- **Model date**: Not yet published.
28
 
29
- **Model version**: Different types of models trained on NCCR data using SMILES or SELFIES, potentially also with augmentation.
30
 
31
  **Model type**: A sequence-based molecular generator tuned to generate catalysts. The model relies on a recurrent Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.
32
 
33
  **Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**:
34
  N.A.
35
 
36
- **Paper or other resource for more information**:
37
- TBD
38
 
39
  **License**: MIT
40
 
41
  **Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).
42
 
43
- **Intended Use. Use cases that were envisioned during development**: Chemical research, in particular drug discovery.
44
 
45
- **Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes.
46
 
47
  **Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.
48
 
49
  **Metrics**: N.A.
50
 
51
- **Datasets**: Data provided through NCCR.
52
 
53
  **Ethical Considerations**: Unclear, please consult with original authors in case of questions.
54
 
@@ -60,9 +60,9 @@ Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi
60
  TBD, temporarily please cite:
61
  ```bib
62
  @article{manica2022gt4sd,
63
- title={GT4SD: Generative Toolkit for Scientific Discovery},
64
- author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
65
- journal={arXiv preprint arXiv:2207.03928},
66
- year={2022}
67
  }
68
  ```
 
2
 
3
  **Algorithm Version**: Which model version to use.
4
 
5
+ **Target binding energy**: The desired binding energy. The optimal range determined in [literature](https://doi.org/10.1039/C8SC01949E) is between -31.1 and -23.0 kcal/mol.
6
 
7
+ **Primer SMILES**: A SMILES string is used to prime the generation.
8
 
9
+ **Maximal sequence length**: The maximal number of tokens in the generated molecule.
10
 
11
  **Number of points**: Number of points to sample with the Gaussian Process.
12
 
 
24
 
25
  **Distributors**: Original authors' code integrated into GT4SD.
26
 
27
+ **Model date**: Not yet published. Manuscript accepted.
28
 
29
+ **Model version**: Different types of models trained on 7054 data points are represented either as SMILES or SELFIES. Augmentation was used to broaden the scope augmentation.
30
 
31
  **Model type**: A sequence-based molecular generator tuned to generate catalysts. The model relies on a recurrent Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.
32
 
33
  **Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**:
34
  N.A.
35
 
36
+ **Paper or other resources for more information**:
37
+
38
 
39
  **License**: MIT
40
 
41
  **Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).
42
 
43
+ **Intended Use. Use cases that were envisioned during development**: Chemical research, in particular, to discover new Suzuki cross-coupling catalysts.
44
 
45
+ **Primary intended uses/users**: Researchers and computational chemists using the model for research exploration purposes.
46
 
47
  **Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.
48
 
49
  **Metrics**: N.A.
50
 
51
+ **Datasets**: Data used for training was provided through the NCCR and can be found [here](https://doi.org/10.24435/materialscloud:2018.0014/v1) and [here](https://doi.org/10.24435/materialscloud:2019.0007/v3).
52
 
53
  **Ethical Considerations**: Unclear, please consult with original authors in case of questions.
54
 
 
60
  TBD, temporarily please cite:
61
  ```bib
62
  @article{manica2022gt4sd,
63
+ title={GT4SD: Generative Toolkit for Scientific Discovery},
64
+ author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
65
+ journal={arXiv preprint arXiv:2207.03928},
66
+ year={2022}
67
  }
68
  ```
model_cards/description.md CHANGED
@@ -1,6 +1,8 @@
1
  <img align="right" src="https://raw.githubusercontent.com/GT4SD/gt4sd-core/main/docs/_static/gt4sd_logo.png" alt="logo" width="120" >
2
 
3
- *AdvancedManufacturing* is a sequence-based molecular generator tuned to generate catalysts. The model relies on a Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.
 
4
 
5
  For **examples** and **documentation** of the model parameters, please see below.
6
  Moreover, we provide a **model card** ([Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)) at the bottom of this page.
 
 
1
  <img align="right" src="https://raw.githubusercontent.com/GT4SD/gt4sd-core/main/docs/_static/gt4sd_logo.png" alt="logo" width="120" >
2
 
3
+ *AdvancedManufacturing* is a sequence-based molecular generator tuned to generate catalysts for the Suzuki cross-coupling. The model relies on a Variational Autoencoder with a binding-energy predictor trained on the latent space. The framework uses Gaussian Processes for generating targeted molecules. The model was trained on 7054 Catalysts provided by
4
+ [Meyer et al.](DOI https://doi.org/10.1039/C8SC01949E).
5
 
6
  For **examples** and **documentation** of the model parameters, please see below.
7
  Moreover, we provide a **model card** ([Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)) at the bottom of this page.
8
+