File size: 2,091 Bytes
cf80a81 f566f99 cf80a81 f566f99 cf80a81 ed920f9 c9c5888 ed920f9 c9c5888 f566f99 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
license: cc-by-nc-nd-4.0
extra_gated_fields:
Name: text
Company: text
Country: country
Specific date: date_picker
I want to use this model for:
type: select
options:
- Research
- Education
- label: Other
value: other
I agree to share generated sequences and associated data with authors before publishing: checkbox
I agree not to file patents on any sequences generated by this model: checkbox
I agree to use this model for non-commercial use ONLY: checkbox
base_model:
- facebook/esm2_t30_150M_UR50D
pipeline_tag: fill-mask
---
# MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65bbea9a26c639b000501321/uWW6xnJZwQFWDS1QZNQTm.png)
Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences.
## Model Usage
The MDLM model leverages an internal backbone model, which is a fine-tune of ESM2 (150M). This backbone model can be used through this repo:
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("ChatterjeeLab/MeMDLM")
model = AutoModelForMaskedLM.from_pretrained("ChatterjeeLab/MeMDLM")
input_sequence = "QMMALTFITYIGCGLSSIFLSVTLVILIQLCAALLLLNLIFLLDSWIALYnTRGFCIAVAVFLHYFLLVSFTWMGLEAFHMYLKFCIVGWGIPAVVVSIVLTISPDNYGidFCWINSNVVFYITVVGYFCVIFLLNVSMFIVVLVQLCRIKKKKQLGDL"
inputs = tokenizer(input_sequence, return_tensors="pt")
output = model(**inputs)
filled_protein_seq = tokenizer.decode(output.squeeze()) # contains the output protein sequence with filled mask tokens
```
This backbone model can be integrated with the [MDLM formulation](https://github.com/kuleshov-group/mdlm) by setting the model backbone type to "hf_dit" and setting the HuggingFace Model ID to "ChatterjeeLab/MeMDLM" |