Niksa Praljak
commited on
Commit
·
72a56de
1
Parent(s):
1f2e18d
instructinos for installing LLMs pretrained weights
Browse files- README.md +14 -1
- stage1_config.json +3 -5
- weights/README.md +1 -0
README.md
CHANGED
@@ -48,11 +48,13 @@ cd /path/BioM3/weights
|
|
48 |
```
|
49 |
|
50 |
Note: choose the desired BioM3 configuration/checkpoint, then install weights for each folder:
|
|
|
51 |
- `/path/BioM3/weights/PenCL`
|
52 |
- `/path/BioM3/weights/Facilitator`
|
53 |
- `/path/BioM3/weights/ProteoScribe`
|
54 |
|
55 |
Each folder contains a `README.md` detailing the different model weight configurations. For benchmarking, the optimal configuration is:
|
|
|
56 |
- `BioM3_PenCL_epoch20.bin`
|
57 |
- `BioM3_Facilitator_epoch20.bin`
|
58 |
- `BioM3_ProteoScribe_epoch20.bin`
|
@@ -68,10 +70,21 @@ This stage demonstrates how to perform inference using the **BioM3 PenCL model**
|
|
68 |
|
69 |
Before running the model, ensure you have:
|
70 |
- Configuration file: `stage1_config.json`
|
71 |
-
- Pre-trained weights: `BioM3_PenCL_epoch20.bin`
|
72 |
|
73 |
### Running the Model
|
74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
1. Change directory to BioM3 repo:
|
76 |
```bash
|
77 |
cd /path/BioM3 # /path/ where is the location to the cloned BioM3 repo
|
|
|
48 |
```
|
49 |
|
50 |
Note: choose the desired BioM3 configuration/checkpoint, then install weights for each folder:
|
51 |
+
- `/path/BioM3/weights/LLMs` # install ESM2 and PubMedBert pretrained wieghts for compiling PenCL
|
52 |
- `/path/BioM3/weights/PenCL`
|
53 |
- `/path/BioM3/weights/Facilitator`
|
54 |
- `/path/BioM3/weights/ProteoScribe`
|
55 |
|
56 |
Each folder contains a `README.md` detailing the different model weight configurations. For benchmarking, the optimal configuration is:
|
57 |
+
- `esm2_t33_650M_UR50D.pt`, `esm2_t33_650M_UR50D-contact-regression.pt`, and `BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext`
|
58 |
- `BioM3_PenCL_epoch20.bin`
|
59 |
- `BioM3_Facilitator_epoch20.bin`
|
60 |
- `BioM3_ProteoScribe_epoch20.bin`
|
|
|
70 |
|
71 |
Before running the model, ensure you have:
|
72 |
- Configuration file: `stage1_config.json`
|
73 |
+
- Pre-trained weights: `BioM3_PenCL_epoch20.bin`, `esm2_t33_650M_UR50D.pt`, `esm2_t33_650M_UR50D-contact-regression.pt`, and `BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext`.
|
74 |
|
75 |
### Running the Model
|
76 |
|
77 |
+
0. Change json configuration for Stage 1:
|
78 |
+
|
79 |
+
```bash
|
80 |
+
vim stage1_config.json
|
81 |
+
|
82 |
+
# replace <working_directory> with your path
|
83 |
+
"seq_model_path": "<working_directory>/BioM3/weights/LLMs/esm2_t33_650M_UR50D.pt"
|
84 |
+
"text_model_path": "<working_directory>/weights/LLMs/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext",
|
85 |
+
|
86 |
+
```
|
87 |
+
|
88 |
1. Change directory to BioM3 repo:
|
89 |
```bash
|
90 |
cd /path/BioM3 # /path/ where is the location to the cloned BioM3 repo
|
stage1_config.json
CHANGED
@@ -4,8 +4,8 @@
|
|
4 |
"tb_logger_path": "None",
|
5 |
"tb_logger_folder": "None",
|
6 |
"version_name": "None",
|
7 |
-
"model_checkpoint_path": "
|
8 |
-
"output_dict_path": "
|
9 |
"valid_size": 0.2,
|
10 |
"epochs": 10,
|
11 |
"acc_grad_batches": 1,
|
@@ -44,7 +44,5 @@
|
|
44 |
"bLM_n_layers_to_finetune": 1,
|
45 |
"proj_embedding_dim": 512,
|
46 |
"dropout": 0.1,
|
47 |
-
"head_lr": 0.0005
|
48 |
-
"inference_data_path": "/project/ranganathanr/niksapraljak/BioM3_PDZ/data/test_prompts_PDZ_swissprot_pfam_dataset.csv",
|
49 |
-
"inference_output_path": "/project/ranganathanr/niksapraljak/BioM3_PDZ/outputs/Stage1_test_prompts_PDZ.pt"
|
50 |
}
|
|
|
4 |
"tb_logger_path": "None",
|
5 |
"tb_logger_folder": "None",
|
6 |
"version_name": "None",
|
7 |
+
"model_checkpoint_path": "None",
|
8 |
+
"output_dict_path": "None",
|
9 |
"valid_size": 0.2,
|
10 |
"epochs": 10,
|
11 |
"acc_grad_batches": 1,
|
|
|
44 |
"bLM_n_layers_to_finetune": 1,
|
45 |
"proj_embedding_dim": 512,
|
46 |
"dropout": 0.1,
|
47 |
+
"head_lr": 0.0005
|
|
|
|
|
48 |
}
|
weights/README.md
CHANGED
@@ -2,6 +2,7 @@
|
|
2 |
|
3 |
This folder contains the pre-trained weights for the **BioM3** project models. The weights are stored as `.bin` files for different components of the BioM3 pipeline:
|
4 |
|
|
|
5 |
1. **PenCL**: Pre-trained weights for the PenCL model (Stage 1).
|
6 |
2. **Facilitator**: Pre-trained weights for the Facilitator model (Stage 2).
|
7 |
3. **ProteoScribe**: Pre-trained weights for the ProteoScribe model (Stage 3).
|
|
|
2 |
|
3 |
This folder contains the pre-trained weights for the **BioM3** project models. The weights are stored as `.bin` files for different components of the BioM3 pipeline:
|
4 |
|
5 |
+
0. **LLMs**: Pre-trained weights for compiling PenCL model.
|
6 |
1. **PenCL**: Pre-trained weights for the PenCL model (Stage 1).
|
7 |
2. **Facilitator**: Pre-trained weights for the Facilitator model (Stage 2).
|
8 |
3. **ProteoScribe**: Pre-trained weights for the ProteoScribe model (Stage 3).
|