weights/LLMs/README.md
LLMs Pre-trained Weights for Compiling PenCL
This folder contains the pre-trained weights for the ESM2 and PubMedBERT models to compile PenCL model (Stage 1 of BioM3). The PenCL model aligns protein sequences and text descriptions to compute joint latent embeddings.
Downloading Pre-trained Weights
ESM2 Model
To download the ESM2 (650M parameter) model weights:
wget https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt
wget https://dl.fbaipublicfiles.com/fair-esm/regression/esm2_t33_650M_UR50D-contact-regression.pt
PubMedBERT Model
Make sure large-file storage capabilities are installed in your environment before cloning HuggingFace model card.
git lfs install
To download the PubMedBERT model weights:
git clone https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
Usage
Once available, the pre-trained weights can be loaded as follows:
[Your usage instructions here]
File Structure
After downloading, your weights directory should contain:
weights/
βββ LLMs/
βββ esm2_t33_650M_UR50D.pt
βββ BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext/
Note: The PubMedBERT download will create a directory containing the model weights and configuration files, while ESM2 downloads as a single file.