Niksa Praljak commited on
Commit
72a56de
·
1 Parent(s): 1f2e18d

instructinos for installing LLMs pretrained weights

Browse files
Files changed (3) hide show
  1. README.md +14 -1
  2. stage1_config.json +3 -5
  3. weights/README.md +1 -0
README.md CHANGED
@@ -48,11 +48,13 @@ cd /path/BioM3/weights
48
  ```
49
 
50
  Note: choose the desired BioM3 configuration/checkpoint, then install weights for each folder:
 
51
  - `/path/BioM3/weights/PenCL`
52
  - `/path/BioM3/weights/Facilitator`
53
  - `/path/BioM3/weights/ProteoScribe`
54
 
55
  Each folder contains a `README.md` detailing the different model weight configurations. For benchmarking, the optimal configuration is:
 
56
  - `BioM3_PenCL_epoch20.bin`
57
  - `BioM3_Facilitator_epoch20.bin`
58
  - `BioM3_ProteoScribe_epoch20.bin`
@@ -68,10 +70,21 @@ This stage demonstrates how to perform inference using the **BioM3 PenCL model**
68
 
69
  Before running the model, ensure you have:
70
  - Configuration file: `stage1_config.json`
71
- - Pre-trained weights: `BioM3_PenCL_epoch20.bin`
72
 
73
  ### Running the Model
74
 
 
 
 
 
 
 
 
 
 
 
 
75
  1. Change directory to BioM3 repo:
76
  ```bash
77
  cd /path/BioM3 # /path/ where is the location to the cloned BioM3 repo
 
48
  ```
49
 
50
  Note: choose the desired BioM3 configuration/checkpoint, then install weights for each folder:
51
+ - `/path/BioM3/weights/LLMs` # install ESM2 and PubMedBert pretrained wieghts for compiling PenCL
52
  - `/path/BioM3/weights/PenCL`
53
  - `/path/BioM3/weights/Facilitator`
54
  - `/path/BioM3/weights/ProteoScribe`
55
 
56
  Each folder contains a `README.md` detailing the different model weight configurations. For benchmarking, the optimal configuration is:
57
+ - `esm2_t33_650M_UR50D.pt`, `esm2_t33_650M_UR50D-contact-regression.pt`, and `BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext`
58
  - `BioM3_PenCL_epoch20.bin`
59
  - `BioM3_Facilitator_epoch20.bin`
60
  - `BioM3_ProteoScribe_epoch20.bin`
 
70
 
71
  Before running the model, ensure you have:
72
  - Configuration file: `stage1_config.json`
73
+ - Pre-trained weights: `BioM3_PenCL_epoch20.bin`, `esm2_t33_650M_UR50D.pt`, `esm2_t33_650M_UR50D-contact-regression.pt`, and `BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext`.
74
 
75
  ### Running the Model
76
 
77
+ 0. Change json configuration for Stage 1:
78
+
79
+ ```bash
80
+ vim stage1_config.json
81
+
82
+ # replace <working_directory> with your path
83
+ "seq_model_path": "<working_directory>/BioM3/weights/LLMs/esm2_t33_650M_UR50D.pt"
84
+ "text_model_path": "<working_directory>/weights/LLMs/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext",
85
+
86
+ ```
87
+
88
  1. Change directory to BioM3 repo:
89
  ```bash
90
  cd /path/BioM3 # /path/ where is the location to the cloned BioM3 repo
stage1_config.json CHANGED
@@ -4,8 +4,8 @@
4
  "tb_logger_path": "None",
5
  "tb_logger_folder": "None",
6
  "version_name": "None",
7
- "model_checkpoint_path": "/project/andrewferguson/niksapraljak/Project_ProtARDM/logs/Stage1_final_models/checkpoints/Pretraining_PENCiL_45M/epoch=19-step=116600.ckpt",
8
- "output_dict_path": "/project/ranganathanr/niksapraljak/BioM3_PDZ/outputs/output_dict.pt",
9
  "valid_size": 0.2,
10
  "epochs": 10,
11
  "acc_grad_batches": 1,
@@ -44,7 +44,5 @@
44
  "bLM_n_layers_to_finetune": 1,
45
  "proj_embedding_dim": 512,
46
  "dropout": 0.1,
47
- "head_lr": 0.0005,
48
- "inference_data_path": "/project/ranganathanr/niksapraljak/BioM3_PDZ/data/test_prompts_PDZ_swissprot_pfam_dataset.csv",
49
- "inference_output_path": "/project/ranganathanr/niksapraljak/BioM3_PDZ/outputs/Stage1_test_prompts_PDZ.pt"
50
  }
 
4
  "tb_logger_path": "None",
5
  "tb_logger_folder": "None",
6
  "version_name": "None",
7
+ "model_checkpoint_path": "None",
8
+ "output_dict_path": "None",
9
  "valid_size": 0.2,
10
  "epochs": 10,
11
  "acc_grad_batches": 1,
 
44
  "bLM_n_layers_to_finetune": 1,
45
  "proj_embedding_dim": 512,
46
  "dropout": 0.1,
47
+ "head_lr": 0.0005
 
 
48
  }
weights/README.md CHANGED
@@ -2,6 +2,7 @@
2
 
3
  This folder contains the pre-trained weights for the **BioM3** project models. The weights are stored as `.bin` files for different components of the BioM3 pipeline:
4
 
 
5
  1. **PenCL**: Pre-trained weights for the PenCL model (Stage 1).
6
  2. **Facilitator**: Pre-trained weights for the Facilitator model (Stage 2).
7
  3. **ProteoScribe**: Pre-trained weights for the ProteoScribe model (Stage 3).
 
2
 
3
  This folder contains the pre-trained weights for the **BioM3** project models. The weights are stored as `.bin` files for different components of the BioM3 pipeline:
4
 
5
+ 0. **LLMs**: Pre-trained weights for compiling PenCL model.
6
  1. **PenCL**: Pre-trained weights for the PenCL model (Stage 1).
7
  2. **Facilitator**: Pre-trained weights for the Facilitator model (Stage 2).
8
  3. **ProteoScribe**: Pre-trained weights for the ProteoScribe model (Stage 3).