|
--- |
|
extra_gated_fields: |
|
Name: text |
|
Company: text |
|
Country: country |
|
Specific date: date_picker |
|
I want to use this model for: |
|
type: select |
|
options: |
|
- Research |
|
- Education |
|
- label: Other |
|
value: other |
|
extra_gated_prompt: "MOG-DFM License: https://drive.google.com/file/d/1LJuGrsRZMoqsrZa1gSfsCihiih5MPVRA/view?usp=sharing" |
|
extra_gated_heading: Acknowledge license to access the repository |
|
extra_gated_button_content: Acknowledge license |
|
--- |
|
|
|
# Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design |
|
|
|
arXiv Paper: <https://arxiv.org/abs/2505.07086> |
|
|
|
Designing biological sequences that satisfy multiple, often conflicting, functional and biophysical criteria remains a central challenge in biomolecule engineering. While discrete flow matching models have recently shown promise for efficient sampling in high-dimensional sequence spaces, existing approaches address only single objectives or require continuous embeddings that can distort discrete distributions. We present Multi-Objective-Guided Discrete Flow Matching (MOG-DFM), a general framework to steer any pretrained discrete-time flow matching generator toward Pareto-efficient trade-offs across multiple scalar objectives. At each sampling step, MOG-DFM computes a hybrid rank-directional score for candidate transitions and applies an adaptive hypercone filter to enforce consistent multi-objective progression. We also trained two unconditional discrete flow matching models, PepDFM for diverse peptide generation and EnhancerDFM for functional enhancer DNA generation, as base generation models for MOG-DFM. We demonstrate MOG-DFM's effectiveness in generating peptide binders optimized across five properties (hemolysis, non-fouling, solubility, half-life, and binding affinity), and in designing DNA sequences with specific enhancer classes and DNA shapes. In total, MOG-DFM proves to be a powerful tool for multi-property-guided biomolecule sequence design. |
|
|
|
 |
|
|
|
## Usage |
|
|
|
### 0. Conda Environment |
|
|
|
``` |
|
conda create -n mog-dfm python=3.9 |
|
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia |
|
pip install fair-esm transformers xgboost datasets torchdiffeq |
|
``` |
|
|
|
To use Deep DNAshape, please create another conda environment called `deepDNAshape` following [the guidance of its repository](https://github.com/JinsenLi/deepDNAshape?tab=readme-ov-file#installation). |
|
|
|
|
|
### 1. PepDFM and EnhancerDFM training and evaluation |
|
The pretrained weights for PepDFM and EnhancerDFM are available in the `ckpt` directory. |
|
|
|
The data for PepDFM and EnhancerDFM training are available in the `dataset` directory. |
|
|
|
We also provide the complete training and evaluation code for both models. |
|
|
|
### 2. Multi-Objective Guided Generation |
|
|
|
#### 2.0 Score Models |
|
|
|
The pretrained weights for the score models (hemolysis, non-fouling, solubility, half-life, binding affinity, and enhancer class) are available in the `classifier_ckpt` directory. |
|
|
|
Prediction scripts for each score model are provided in the `classifier_code` directory. |
|
|
|
#### 2.1 Peptide Generation Task |
|
|
|
Example command for peptide generation guided by multiple objectives (hemolysis, non-fouling, solubility, half-life, and binding affinity): |
|
``` |
|
python PepDFM_multi_objective_generation.py --is_peptide True --T 100 --n_samples 5 --n_batches 10 --length 10 --target_protein GSHMIEPNVISVRLFKRKVGGLGFLVKERVSKPPVIISDLIRGGAAEQSGLIQAGDIILAVNDRPLVDLSYDSALEVLRGIASETHVVLILRGPEGFTTHLETTFTGDGTPKTIRVTQPLGPPTKAV |
|
``` |
|
|
|
Note that the hemolysis model outputs one minus the actual hemolysis score, and the half-life model outputs the base-10 logarithm of the half-life in hours. |
|
|
|
The guidance settings and their importance weights can be found and modified in `PepDFM_multi_objective_generation.py` |
|
|
|
#### 2.2 Enhancer DNA Generation Task |
|
|
|
Example command for enhancer DNA generation guided by the enhancer class and DNA shape: |
|
``` |
|
python EnhancerDFM_multi_objective_generation.py --is_peptide False --T 800 --n_samples 5 --n_batches 10 --length 100 --target_enhancer_class 0 --target_DNA_shape HelT |
|
``` |
|
|
|
The guidance settings and their importance weights can be found and modified in `EnhancerDFM_multi_objective_generation.py` |
|
|
|
To use this repository, you agree to abide by the [MOG-DFM License](https://drive.google.com/file/d/1LJuGrsRZMoqsrZa1gSfsCihiih5MPVRA/view?usp=sharing). |