MOG-DFM / README.md
AlienChen's picture
Update README.md
19f4801 verified
---
extra_gated_fields:
Name: text
Company: text
Country: country
Specific date: date_picker
I want to use this model for:
type: select
options:
- Research
- Education
- label: Other
value: other
extra_gated_prompt: "MOG-DFM License: https://drive.google.com/file/d/1LJuGrsRZMoqsrZa1gSfsCihiih5MPVRA/view?usp=sharing"
extra_gated_heading: Acknowledge license to access the repository
extra_gated_button_content: Acknowledge license
---
# Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design
arXiv Paper: <https://arxiv.org/abs/2505.07086>
Designing biological sequences that satisfy multiple, often conflicting, functional and biophysical criteria remains a central challenge in biomolecule engineering. While discrete flow matching models have recently shown promise for efficient sampling in high-dimensional sequence spaces, existing approaches address only single objectives or require continuous embeddings that can distort discrete distributions. We present Multi-Objective-Guided Discrete Flow Matching (MOG-DFM), a general framework to steer any pretrained discrete-time flow matching generator toward Pareto-efficient trade-offs across multiple scalar objectives. At each sampling step, MOG-DFM computes a hybrid rank-directional score for candidate transitions and applies an adaptive hypercone filter to enforce consistent multi-objective progression. We also trained two unconditional discrete flow matching models, PepDFM for diverse peptide generation and EnhancerDFM for functional enhancer DNA generation, as base generation models for MOG-DFM. We demonstrate MOG-DFM's effectiveness in generating peptide binders optimized across five properties (hemolysis, non-fouling, solubility, half-life, and binding affinity), and in designing DNA sequences with specific enhancer classes and DNA shapes. In total, MOG-DFM proves to be a powerful tool for multi-property-guided biomolecule sequence design.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/v4Rr0mhuclD1LN-bWgg2D.png)
## Usage
### 0. Conda Environment
```
conda create -n mog-dfm python=3.9
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install fair-esm transformers xgboost datasets torchdiffeq
```
To use Deep DNAshape, please create another conda environment called `deepDNAshape` following [the guidance of its repository](https://github.com/JinsenLi/deepDNAshape?tab=readme-ov-file#installation).
### 1. PepDFM and EnhancerDFM training and evaluation
The pretrained weights for PepDFM and EnhancerDFM are available in the `ckpt` directory.
The data for PepDFM and EnhancerDFM training are available in the `dataset` directory.
We also provide the complete training and evaluation code for both models.
### 2. Multi-Objective Guided Generation
#### 2.0 Score Models
The pretrained weights for the score models (hemolysis, non-fouling, solubility, half-life, binding affinity, and enhancer class) are available in the `classifier_ckpt` directory.
Prediction scripts for each score model are provided in the `classifier_code` directory.
#### 2.1 Peptide Generation Task
Example command for peptide generation guided by multiple objectives (hemolysis, non-fouling, solubility, half-life, and binding affinity):
```
python PepDFM_multi_objective_generation.py --is_peptide True --T 100 --n_samples 5 --n_batches 10 --length 10 --target_protein GSHMIEPNVISVRLFKRKVGGLGFLVKERVSKPPVIISDLIRGGAAEQSGLIQAGDIILAVNDRPLVDLSYDSALEVLRGIASETHVVLILRGPEGFTTHLETTFTGDGTPKTIRVTQPLGPPTKAV
```
Note that the hemolysis model outputs one minus the actual hemolysis score, and the half-life model outputs the base-10 logarithm of the half-life in hours.
The guidance settings and their importance weights can be found and modified in `PepDFM_multi_objective_generation.py`
#### 2.2 Enhancer DNA Generation Task
Example command for enhancer DNA generation guided by the enhancer class and DNA shape:
```
python EnhancerDFM_multi_objective_generation.py --is_peptide False --T 800 --n_samples 5 --n_batches 10 --length 100 --target_enhancer_class 0 --target_DNA_shape HelT
```
The guidance settings and their importance weights can be found and modified in `EnhancerDFM_multi_objective_generation.py`
To use this repository, you agree to abide by the [MOG-DFM License](https://drive.google.com/file/d/1LJuGrsRZMoqsrZa1gSfsCihiih5MPVRA/view?usp=sharing).