yinuozhang commited on
Commit
e319e34
·
verified ·
1 Parent(s): 26edfc8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -50
README.md CHANGED
@@ -1,50 +1,55 @@
1
- ---
2
- license: cc-by-nc-nd-4.0
3
- ---
4
-
5
- # MetaLATTE: Metal Binding Prediction via Multi-Task Learning on Protein Language Model Latents
6
-
7
- The bioremediation of environments contaminated with heavy metals is an important challenge in environmental biotechnology, which may benefit from the identification of proteins that bind and neutralize these metals. Here, we introduce a novel predictive algorithm that conducts **Metal** binding prediction via **LA**nguage model la**T**en**T** **E**mbeddings using a multi-task learning approach to accurately classify the metal-binding properties of input protein sequences. Our **MetaLATTE** model utilizes the state-of-the-art ESM-2 protein language model (pLM) embeddings and a position-sensitive attention mechanism to predict the likelihood of binding to specific metals, such as zinc, lead, and mercury. Importantly, our approach addresses the challenges posed by proteins from understudied organisms, which are often absent in traditional metal-binding databases, without the requirement of an input structure. By providing a probability distribution over potential binding metals, our classifier elucidates specific interactions of proteins with diverse metal ions. We envision that MetaLATTE will serve as a powerful tool for rapidly screening and identifying new metal-binding proteins, from metagenomic discovery or _de novo_ design efforts, which can later be employed in targeted bioremediation campaigns.
8
-
9
- ![workflow](figures/Figure1.png)
10
-
11
- Inference instruction will be released soon.
12
-
13
- ## Interactive Demo
14
-
15
- You can try out the MetaLATTE model directly in your browser:
16
-
17
- <https://huggingface.co/spaces/ChatterjeeLab/MetaLATTE-demo>
18
-
19
-
20
- ## Usage
21
-
22
- ```python
23
- import sys
24
- from transformers import AutoTokenizer, AutoModel, AutoConfig
25
- metalatte_path = './Chatterjeelab/MetaLATTE'
26
- sys.path.insert(0, metalatte_path)
27
- from metalatte import MetaLATTEConfig, MultitaskProteinModel
28
- AutoConfig.register("metalatte", MetaLATTEConfig)
29
- AutoModel.register(MetaLATTEConfig, MultitaskProteinModel)
30
-
31
-
32
- tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
33
- config = AutoConfig.from_pretrained("ChatterjeeLab/MetaLATTE")
34
- model = AutoModel.from_pretrained("ChatterjeeLab/MetaLATTE", config=config)
35
-
36
- model.eval()
37
- sequence = "AVYNIGWSFNVNGARGKSFRAGDVLVFKYIKGQHNVVAVNGRGYASCSAPRGARTYSSGQDRIKLTRGQNYFICSFPGHCGGGMKIAINAK"
38
- inputs = tokenizer(sequence, return_tensors="pt")
39
- raw_probs, predictions = model.predict(**inputs)
40
-
41
- id2label = config.id2label
42
- predicted_labels = [id2label[i] for i, pred in enumerate(predictions[0]) if pred == 1]
43
- print(predicted_labels)
44
- ['Cu']
45
-
46
- ```
47
-
48
- # Repo Author
49
- - Yinuo Zhang ([email protected])
50
- - Pranam Chatterjee ([email protected])
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-nd-4.0
3
+ tags:
4
+ - climate
5
+ - biology
6
+ ---
7
+
8
+ [preprint](https://www.biorxiv.org/content/10.1101/2024.06.26.600843v1)
9
+
10
+ # MetaLATTE: Metal Binding Prediction via Multi-Task Learning on Protein Language Model Latents
11
+
12
+ The bioremediation of environments contaminated with heavy metals is an important challenge in environmental biotechnology, which may benefit from the identification of proteins that bind and neutralize these metals. Here, we introduce a novel predictive algorithm that conducts **Metal** binding prediction via **LA**nguage model la**T**en**T** **E**mbeddings using a multi-task learning approach to accurately classify the metal-binding properties of input protein sequences. Our **MetaLATTE** model utilizes the state-of-the-art ESM-2 protein language model (pLM) embeddings and a position-sensitive attention mechanism to predict the likelihood of binding to specific metals, such as zinc, lead, and mercury. Importantly, our approach addresses the challenges posed by proteins from understudied organisms, which are often absent in traditional metal-binding databases, without the requirement of an input structure. By providing a probability distribution over potential binding metals, our classifier elucidates specific interactions of proteins with diverse metal ions. We envision that MetaLATTE will serve as a powerful tool for rapidly screening and identifying new metal-binding proteins, from metagenomic discovery or _de novo_ design efforts, which can later be employed in targeted bioremediation campaigns.
13
+
14
+ ![workflow](figures/Figure1.png)
15
+
16
+
17
+
18
+ ## Interactive Demo
19
+
20
+ You can try out the MetaLATTE model directly in your browser:
21
+
22
+ <https://huggingface.co/spaces/ChatterjeeLab/MetaLATTE-demo>
23
+
24
+
25
+ ## Usage
26
+
27
+ ```python
28
+ import sys
29
+ from transformers import AutoTokenizer, AutoModel, AutoConfig
30
+ metalatte_path = './Chatterjeelab/MetaLATTE'
31
+ sys.path.insert(0, metalatte_path)
32
+ from metalatte import MetaLATTEConfig, MultitaskProteinModel
33
+ AutoConfig.register("metalatte", MetaLATTEConfig)
34
+ AutoModel.register(MetaLATTEConfig, MultitaskProteinModel)
35
+
36
+
37
+ tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
38
+ config = AutoConfig.from_pretrained("ChatterjeeLab/MetaLATTE")
39
+ model = AutoModel.from_pretrained("ChatterjeeLab/MetaLATTE", config=config)
40
+
41
+ model.eval()
42
+ sequence = "AVYNIGWSFNVNGARGKSFRAGDVLVFKYIKGQHNVVAVNGRGYASCSAPRGARTYSSGQDRIKLTRGQNYFICSFPGHCGGGMKIAINAK"
43
+ inputs = tokenizer(sequence, return_tensors="pt")
44
+ raw_probs, predictions = model.predict(**inputs)
45
+
46
+ id2label = config.id2label
47
+ predicted_labels = [id2label[i] for i, pred in enumerate(predictions[0]) if pred == 1]
48
+ print(predicted_labels)
49
+ ['Cu']
50
+
51
+ ```
52
+
53
+ # Repo Author
54
+ - Yinuo Zhang ([email protected])
55
+ - Pranam Chatterjee ([email protected])