Metadata-Version: 2.1 Name: mhg-gnn Version: 0.0 Summary: Package for mhg-gnn Author: team License: TBD Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.9 Description-Content-Type: text/markdown Requires-Dist: networkx>=2.8 Requires-Dist: numpy<2.0.0,>=1.23.5 Requires-Dist: pandas>=1.5.3 Requires-Dist: rdkit-pypi<2023.9.6,>=2022.9.4 Requires-Dist: torch>=2.0.0 Requires-Dist: torchinfo>=1.8.0 Requires-Dist: torch-geometric>=2.3.1 # mhg-gnn This repository provides PyTorch source code assosiated with our publication, "MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network" **Paper:** [Arxiv Link](https://arxiv.org/pdf/2309.16374) For more information contact: SEIJITKD@jp.ibm.com ![mhg-gnn](images/mhg_example1.png) ## Introduction We present MHG-GNN, an autoencoder architecture that has an encoder based on GNN and a decoder based on a sequential model with MHG. Since the encoder is a GNN variant, MHG-GNN can accept any molecule as input, and demonstrate high predictive performance on molecular graph data. In addition, the decoder inherits the theoretical guarantee of MHG on always generating a structurally valid molecule as output. ## Table of Contents 1. [Getting Started](#getting-started) 1. [Pretrained Models and Training Logs](#pretrained-models-and-training-logs) 2. [Replicating Conda Environment](#replicating-conda-environment) 2. [Feature Extraction](#feature-extraction) ## Getting Started **This code and environment have been tested on Intel E5-2667 CPUs at 3.30GHz and NVIDIA A100 Tensor Core GPUs.** ### Pretrained Models and Training Logs We provide checkpoints of the MHG-GNN model pre-trained on a dataset of ~1.34M molecules curated from PubChem. (later) For model weights: [HuggingFace Link]() Add the MHG-GNN `pre-trained weights.pt` to the `models/` directory according to your needs. ### Replacicating Conda Environment Follow these steps to replicate our Conda environment and install the necessary libraries: ``` conda create --name mhg-gnn-env python=3.8.18 conda activate mhg-gnn-env ``` #### Install Packages with Conda ``` conda install -c conda-forge networkx=2.8 conda install numpy=1.23.5 # conda install -c conda-forge rdkit=2022.9.4 conda install pytorch=2.0.0 torchvision torchaudio -c pytorch conda install -c conda-forge torchinfo=1.8.0 conda install pyg -c pyg ``` #### Install Packages with pip ``` pip install rdkit torch-nl==0.3 torch-scatter torch-sparse ``` ## Feature Extraction The example notebook [mhg-gnn_encoder_decoder_example.ipynb](notebooks/mhg-gnn_encoder_decoder_example.ipynb) contains code to load checkpoint files and use the pre-trained model for encoder and decoder tasks. To load mhg-gnn, you can simply use: ```python import torch import load model = load.load() ``` To encode SMILES into embeddings, you can use: ```python with torch.no_grad(): repr = model.encode(["CCO", "O=C=O", "OC(=O)c1ccccc1C(=O)O"]) ``` For decoder, you can use the function, so you can return from embeddings to SMILES strings: ```python orig = model.decode(repr) ```