File size: 1,400 Bytes
07026ee
 
 
 
 
 
 
 
 
 
 
 
 
 
1f2e18d
07026ee
 
 
1f2e18d
 
 
 
 
 
07026ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

---
### **`weights/LLMs/README.md`**

# LLMs Pre-trained Weights for Compiling PenCL

This folder contains the pre-trained weights for the **ESM2** and **PubMedBERT** models to compile **PenCL** model (Stage 1 of BioM3). The PenCL model aligns protein sequences and text descriptions to compute joint latent embeddings.

## Downloading Pre-trained Weights

### ESM2 Model
To download the ESM2 (650M parameter) model weights:
```bash
wget https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt
wget https://dl.fbaipublicfiles.com/fair-esm/regression/esm2_t33_650M_UR50D-contact-regression.pt
```

### PubMedBERT Model

Make sure large-file storage capabilities are installed in your environment before cloning HuggingFace model card.
```bash
git lfs install 
```

To download the PubMedBERT model weights:
```bash
git clone https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
```

## Usage
Once available, the pre-trained weights can be loaded as follows:

[Your usage instructions here]

## File Structure
After downloading, your weights directory should contain:
```
weights/
└── LLMs/
    β”œβ”€β”€ esm2_t33_650M_UR50D.pt
    └── BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext/
```

Note: The PubMedBERT download will create a directory containing the model weights and configuration files, while ESM2 downloads as a single file.