Fill-Mask
Transformers
Safetensors
esm
pranamanam commited on
Commit
2d59622
·
verified ·
1 Parent(s): 580ec07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -1,12 +1,12 @@
1
  ---
2
  license: cc-by-nc-nd-4.0
3
  ---
4
- **FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking**
5
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/eR38p4VJhWJhwsqjZZdYp.png)
6
  In this work, we introduce **FusOn-pLM**, a novel pLM that fine-tunes state-of-the-art ESM-2 embeddings on fusion oncoprotein sequences, those that drive a large portion of pediatric cancers but are heavily disordered and undruggable, via masked language modeling (MLM). We specifically introduce a novel MLM strategy, employing a binding-site probability predictor to focus masking on key amino acid residues, thereby generating more optimal fusion oncoprotein-aware embeddings. Our model improves performance on both fusion oncoprotein-specific benchmarks and disorder prediction tasks in comparison to baseline ESM-2 representations, as well as manually-constructed biophysical embeddings, motivating downstream usage of FusOn-pLM embeddings for therapeutic design tasks targeting these fusions.
7
 
8
 
9
- # How to generate FusOn-pLM embeddings for your fusion oncoprotein
10
 
11
  ```
12
  from transformers import AutoTokenizer, AutoModel
 
1
  ---
2
  license: cc-by-nc-nd-4.0
3
  ---
4
+ # FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking
5
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/eR38p4VJhWJhwsqjZZdYp.png)
6
  In this work, we introduce **FusOn-pLM**, a novel pLM that fine-tunes state-of-the-art ESM-2 embeddings on fusion oncoprotein sequences, those that drive a large portion of pediatric cancers but are heavily disordered and undruggable, via masked language modeling (MLM). We specifically introduce a novel MLM strategy, employing a binding-site probability predictor to focus masking on key amino acid residues, thereby generating more optimal fusion oncoprotein-aware embeddings. Our model improves performance on both fusion oncoprotein-specific benchmarks and disorder prediction tasks in comparison to baseline ESM-2 representations, as well as manually-constructed biophysical embeddings, motivating downstream usage of FusOn-pLM embeddings for therapeutic design tasks targeting these fusions.
7
 
8
 
9
+ **How to generate FusOn-pLM embeddings for your fusion oncoprotein:**
10
 
11
  ```
12
  from transformers import AutoTokenizer, AutoModel