Niksa Praljak commited on
Commit
efd5a17
·
1 Parent(s): eca78a8

Add protein-text pair examples to README

Browse files
Files changed (1) hide show
  1. README.md +15 -1
README.md CHANGED
@@ -2,7 +2,7 @@
2
  license: apache-2.0
3
  ---
4
 
5
- # BioM3: Protein Language Model Pipeline
6
 
7
  ## Citation
8
 
@@ -65,6 +65,20 @@ python run_PenCL_inference.py \
65
  --model_path "BioM3_PenCL_epoch20.bin"
66
  ```
67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ### Expected Output
69
 
70
  The script provides the following outputs:
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # BioM3: Biological Multi-Modal Model for Protein Design
6
 
7
  ## Citation
8
 
 
65
  --model_path "BioM3_PenCL_epoch20.bin"
66
  ```
67
 
68
+ ### Example Input Data
69
+
70
+ The script demonstrates inference using two protein-text pairs from the SwissProt dataset:
71
+
72
+ **Pair 1:**
73
+ - **Protein Sequence:** MSLEQKKGADIISKILQIQNSIGKTTSPSTLKTKLSEISRKEQENARIQSKL...
74
+ - **Text Description:** PROTEIN NAME: 2' cyclic ADP-D-ribose synthase AbTIR...
75
+
76
+ **Pair 2:**
77
+ - **Protein Sequence:** MRFQVIVAAATITMITSYIPGVASQSTSDGDDLFVPVSNFDPKSIFPEIKHP...
78
+ - **Text Description:** PROTEIN NAME: Glucan endo-1,3-beta-D-glucosidase 1...
79
+
80
+ These pairs demonstrate how the model aligns protein sequences with their corresponding functional descriptions. The model will compute embeddings for both the sequences and descriptions, then calculate their similarities using dot product scores.
81
+
82
  ### Expected Output
83
 
84
  The script provides the following outputs: