Niksa Praljak
commited on
Commit
•
f782d11
1
Parent(s):
efd5a17
update all README.md
Browse files- README.md +1 -1
- weights/Facilitator/README.md +35 -0
- weights/PenCL/README.md +69 -0
- weights/ProteoScribe/README.md +35 -0
- weights/README.md +27 -0
README.md
CHANGED
@@ -62,7 +62,7 @@ cd BioM3_PenCL
|
|
62 |
```bash
|
63 |
python run_PenCL_inference.py \
|
64 |
--json_path "stage1_config.json" \
|
65 |
-
--model_path "BioM3_PenCL_epoch20.bin"
|
66 |
```
|
67 |
|
68 |
### Example Input Data
|
|
|
62 |
```bash
|
63 |
python run_PenCL_inference.py \
|
64 |
--json_path "stage1_config.json" \
|
65 |
+
--model_path "./weights/PenCL/BioM3_PenCL_epoch20.bin"
|
66 |
```
|
67 |
|
68 |
### Example Input Data
|
weights/Facilitator/README.md
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
|
4 |
+
### **`weights/Facilitator/README.md`**
|
5 |
+
|
6 |
+
```markdown
|
7 |
+
# Facilitator Pre-trained Weights
|
8 |
+
|
9 |
+
This folder will contain the pre-trained weights for the **Facilitator** model. The Facilitator model is part of the BioM3 pipeline and serves as a key component for further alignment or generation tasks.
|
10 |
+
|
11 |
+
---
|
12 |
+
|
13 |
+
## **Downloading Pre-trained Weights**
|
14 |
+
|
15 |
+
The Google Drive link for downloading the Facilitator pre-trained weights will be added here soon.
|
16 |
+
|
17 |
+
---
|
18 |
+
|
19 |
+
## **File Details**
|
20 |
+
|
21 |
+
- **File Name**: Facilitator pre-trained weights (TBD).
|
22 |
+
- **Description**: Pre-trained weights for the Facilitator model.
|
23 |
+
|
24 |
+
---
|
25 |
+
|
26 |
+
## **Usage**
|
27 |
+
|
28 |
+
Once available, the pre-trained weights can be loaded as follows:
|
29 |
+
|
30 |
+
```python
|
31 |
+
import torch
|
32 |
+
model = YourFacilitatorModel() # Replace with your model class
|
33 |
+
model.load_state_dict(torch.load("weights/Facilitator/Facilitator_weights.bin", map_location="cpu"))
|
34 |
+
model.eval()
|
35 |
+
|
weights/PenCL/README.md
ADDED
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
|
4 |
+
### **`weights/PenCL/README.md`**
|
5 |
+
|
6 |
+
```markdown
|
7 |
+
# PenCL Pre-trained Weights
|
8 |
+
|
9 |
+
This folder contains the pre-trained weights for the **PenCL** model (Stage 1 of BioM3). The PenCL model aligns protein sequences and text descriptions to compute joint latent embeddings.
|
10 |
+
|
11 |
+
---
|
12 |
+
|
13 |
+
## **Downloading Pre-trained Weights**
|
14 |
+
|
15 |
+
To download the **PenCL epoch 20 pre-trained weights** as a `.bin` file from Google Drive, use the following command:
|
16 |
+
|
17 |
+
```bash
|
18 |
+
pip install gdown
|
19 |
+
gdown --id 1Lup7Xqwa1NjJpoM2uvvBAdghoM-fecEj -O BioM3_PenCL_epoch20.bin
|
20 |
+
|
21 |
+
---
|
22 |
+
|
23 |
+
## **Usage**
|
24 |
+
|
25 |
+
Once available, the pre-trained weights can be loaded as follows:
|
26 |
+
|
27 |
+
```python
|
28 |
+
import json
|
29 |
+
import torch
|
30 |
+
from argparse import Namespace
|
31 |
+
import Stage1_source.model as mod
|
32 |
+
|
33 |
+
# Step 1: Load JSON Configuration
|
34 |
+
def load_json_config(json_path):
|
35 |
+
"""
|
36 |
+
Load a JSON configuration file and return it as a dictionary.
|
37 |
+
"""
|
38 |
+
with open(json_path, "r") as f:
|
39 |
+
config = json.load(f)
|
40 |
+
return config
|
41 |
+
|
42 |
+
# Step 2: Convert JSON Dictionary to Namespace
|
43 |
+
def convert_to_namespace(config_dict):
|
44 |
+
"""
|
45 |
+
Recursively convert a dictionary to an argparse Namespace.
|
46 |
+
"""
|
47 |
+
for key, value in config_dict.items():
|
48 |
+
if isinstance(value, dict):
|
49 |
+
config_dict[key] = convert_to_namespace(value)
|
50 |
+
return Namespace(**config_dict)
|
51 |
+
|
52 |
+
if __name__ == '__main__':
|
53 |
+
# Path to configuration and weights
|
54 |
+
config_path = "stage1_config.json"
|
55 |
+
model_weights_path = "weights/PenCL/BioM3_PenCL_epoch20.bin"
|
56 |
+
|
57 |
+
# Load Configuration
|
58 |
+
print("Loading configuration...")
|
59 |
+
config_dict = load_json_config(config_path)
|
60 |
+
config_args = convert_to_namespace(config_dict)
|
61 |
+
|
62 |
+
# Load Model
|
63 |
+
print("Loading pre-trained model weights...")
|
64 |
+
model = mod.pfam_PEN_CL(args=config_args) # Initialize the model with arguments
|
65 |
+
model.load_state_dict(torch.load(model_weights_path, map_location="cpu"))
|
66 |
+
model.eval()
|
67 |
+
print("Model loaded successfully with weights!")
|
68 |
+
|
69 |
+
|
weights/ProteoScribe/README.md
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
|
4 |
+
### **`weights/ProteoScribe/README.md`**
|
5 |
+
|
6 |
+
```markdown
|
7 |
+
# ProteoScribe Pre-trained Weights
|
8 |
+
|
9 |
+
This folder will contain the pre-trained weights for the **ProteoScribe** model. ProteoScribe enables advanced functional annotation or protein generation tasks.
|
10 |
+
|
11 |
+
---
|
12 |
+
|
13 |
+
## **Downloading Pre-trained Weights**
|
14 |
+
|
15 |
+
The Google Drive link for downloading the ProteoScribe pre-trained weights will be added here soon.
|
16 |
+
|
17 |
+
---
|
18 |
+
|
19 |
+
## **File Details**
|
20 |
+
|
21 |
+
- **File Name**: ProteoScribe pre-trained weights (TBD).
|
22 |
+
- **Description**: Pre-trained weights for the ProteoScribe model.
|
23 |
+
|
24 |
+
---
|
25 |
+
|
26 |
+
## **Usage**
|
27 |
+
|
28 |
+
Once available, you can load the weights into your model using PyTorch:
|
29 |
+
|
30 |
+
```python
|
31 |
+
import torch
|
32 |
+
model = YourProteoScribeModel() # Replace with your model class
|
33 |
+
model.load_state_dict(torch.load("weights/ProteoScribe/ProteoScribe_weights.bin", map_location="cpu"))
|
34 |
+
model.eval()
|
35 |
+
|
weights/README.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Weights Directory
|
2 |
+
|
3 |
+
This folder contains the pre-trained weights for the **BioM3** project models. The weights are stored as `.bin` files for different components of the BioM3 pipeline:
|
4 |
+
|
5 |
+
1. **PenCL**: Pre-trained weights for the PenCL model (Stage 1).
|
6 |
+
2. **Facilitator**: Pre-trained weights for the Facilitator model (Stage 2).
|
7 |
+
3. **ProteoScribe**: Pre-trained weights for the ProteoScribe model (Stage 3).
|
8 |
+
|
9 |
+
---
|
10 |
+
|
11 |
+
## **Purpose**
|
12 |
+
|
13 |
+
The weights provided here enable users to quickly load and run inference with the pre-trained models for text-protein sequence alignment, functional annotation, and other tasks.
|
14 |
+
|
15 |
+
Each subfolder includes:
|
16 |
+
- Instructions for downloading the desired `.bin` files.
|
17 |
+
- Information on integrating the weights into your workflows.
|
18 |
+
|
19 |
+
---
|
20 |
+
|
21 |
+
### **Prerequisites**
|
22 |
+
|
23 |
+
To download pre-trained weights, you must install `gdown`:
|
24 |
+
|
25 |
+
```bash
|
26 |
+
pip install gdown
|
27 |
+
|