vshirasuna commited on
Commit
7d1fe42
Β·
verified Β·
1 Parent(s): bb009fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -3
README.md CHANGED
@@ -1,3 +1,100 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # 3D Electron Density Grids-based VQGAN (3DGrid-VQGAN)
6
+
7
+ This repository provides PyTorch source code associated with our publication, "A Foundation Model for Simulation-Grade Molecular Electron Densities".
8
+
9
+ **Paper:** [Arxiv Link]()
10
+
11
+ **HuggingFace:** coming soon...
12
+
13
+ For more information contact: [email protected] or [email protected].
14
+
15
+ ![3dgrid-vqgan](images/3dgridvqgan_architecture.png)
16
+
17
+ ## Introduction
18
+
19
+ We present an encoder-decoder chemical foundation model for representing 3D electron density grids, 3DGrid-VQGAN, pre-trained on a dataset of approximately 855K molecules from PubChem database. 3DGrid-VQGAN efficiently encodes high-dimensional data into compact latent representations, enabling downstream tasks such as molecular property prediction with enhanced accuracy. This approach could significantly reduce reliance on computationally intensive quantum chemical simulations, offering simulation-grade data derived directly from learned representations.
20
+
21
+ ## Table of Contents
22
+
23
+ 1. [Getting Started](#getting-started)
24
+ 1. [Pretrained Models and Training Logs](#pretrained-models-and-training-logs)
25
+ 2. [Replicating Conda Environment](#replicating-conda-environment)
26
+ 2. [Pretraining](#pretraining)
27
+ 3. [Finetuning](#finetuning)
28
+ 4. [Feature Extraction](#feature-extraction)
29
+ 5. [Citations](#citations)
30
+
31
+ ## Getting Started
32
+
33
+ **This code and environment have been tested on Nvidia V100s and Nvidia A100s**
34
+
35
+ ### Pretrained Models and Training Logs
36
+
37
+ Add the 3DGrid-VQGAN `pre-trained weights.pt` to the `data/checkpoints/pretrained` directory. The directory structure should look like the following:
38
+
39
+ ```
40
+ data/
41
+ └── checkpoints/
42
+ └── pretrained/
43
+ └── VQGAN_43.pt
44
+ ```
45
+
46
+ ### Replicating Conda Environment
47
+
48
+ Follow these steps to replicate our Conda environment and install the necessary libraries:
49
+
50
+ #### Create and Activate Conda Environment
51
+
52
+ ```
53
+ conda create --name 3dvqgan-env python=3.10
54
+ conda activate 3dvqgan-env
55
+ ```
56
+
57
+ #### Install Packages with Conda
58
+
59
+ ```
60
+ conda install pytorch=2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
61
+ conda install -c conda-forge mpi4py=4.0.0 openmpi=5.0.5
62
+ ```
63
+
64
+ #### Install Packages with Pip
65
+
66
+ ```
67
+ pip install -r requirements.txt
68
+ ```
69
+
70
+ ## Pretraining
71
+
72
+ 3DGrid-VQGAN is pre-trained on approximately 855K 3D electron density grids from PubChem, yielding approximately 7TB of data.
73
+
74
+ The pretraining code provides examples of data processing and model training on a smaller dataset.
75
+
76
+ To pre-train the 3DGrid-VQGAN model, run:
77
+
78
+ ```
79
+ bash training/run_mpi_training.sh
80
+ ```
81
+
82
+ ## Finetuning
83
+
84
+ The finetuning datasets and environment can be found in the [finetune](finetune/) directory. After setting up the environment, you can run a finetuning task with:
85
+
86
+ ```
87
+ bash finetune/run_finetune_qm9_alpha.sh
88
+ ```
89
+
90
+ Finetuning training/checkpointing resources will be available in directories named `data/checkpoints/finetuned/<dataset_name>/<measure_name>`.
91
+
92
+ ## Feature Extraction
93
+
94
+ To extract the embeddings from 3DGrid-VQGAN model, you can simply use:
95
+
96
+ ```python
97
+ bash inference/run_extract_embeddings.sh
98
+ ```
99
+
100
+ ## Citations