Niksa Praljak
commited on
Commit
·
efd5a17
1
Parent(s):
eca78a8
Add protein-text pair examples to README
Browse files
README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
# BioM3:
|
6 |
|
7 |
## Citation
|
8 |
|
@@ -65,6 +65,20 @@ python run_PenCL_inference.py \
|
|
65 |
--model_path "BioM3_PenCL_epoch20.bin"
|
66 |
```
|
67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
### Expected Output
|
69 |
|
70 |
The script provides the following outputs:
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
# BioM3: Biological Multi-Modal Model for Protein Design
|
6 |
|
7 |
## Citation
|
8 |
|
|
|
65 |
--model_path "BioM3_PenCL_epoch20.bin"
|
66 |
```
|
67 |
|
68 |
+
### Example Input Data
|
69 |
+
|
70 |
+
The script demonstrates inference using two protein-text pairs from the SwissProt dataset:
|
71 |
+
|
72 |
+
**Pair 1:**
|
73 |
+
- **Protein Sequence:** MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKL...
|
74 |
+
- **Text Description:** PROTEIN NAME: 2' cyclic ADP-D-ribose synthase AbTIR...
|
75 |
+
|
76 |
+
**Pair 2:**
|
77 |
+
- **Protein Sequence:** MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHP...
|
78 |
+
- **Text Description:** PROTEIN NAME: Glucan endo-1,3-beta-D-glucosidase 1...
|
79 |
+
|
80 |
+
These pairs demonstrate how the model aligns protein sequences with their corresponding functional descriptions. The model will compute embeddings for both the sequences and descriptions, then calculate their similarities using dot product scores.
|
81 |
+
|
82 |
### Expected Output
|
83 |
|
84 |
The script provides the following outputs:
|