File size: 1,400 Bytes
07026ee 1f2e18d 07026ee 1f2e18d 07026ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
### **`weights/LLMs/README.md`**
# LLMs Pre-trained Weights for Compiling PenCL
This folder contains the pre-trained weights for the **ESM2** and **PubMedBERT** models to compile **PenCL** model (Stage 1 of BioM3). The PenCL model aligns protein sequences and text descriptions to compute joint latent embeddings.
## Downloading Pre-trained Weights
### ESM2 Model
To download the ESM2 (650M parameter) model weights:
```bash
wget https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt
wget https://dl.fbaipublicfiles.com/fair-esm/regression/esm2_t33_650M_UR50D-contact-regression.pt
```
### PubMedBERT Model
Make sure large-file storage capabilities are installed in your environment before cloning HuggingFace model card.
```bash
git lfs install
```
To download the PubMedBERT model weights:
```bash
git clone https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
```
## Usage
Once available, the pre-trained weights can be loaded as follows:
[Your usage instructions here]
## File Structure
After downloading, your weights directory should contain:
```
weights/
βββ LLMs/
βββ esm2_t33_650M_UR50D.pt
βββ BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext/
```
Note: The PubMedBERT download will create a directory containing the model weights and configuration files, while ESM2 downloads as a single file.
|