VHHBERT

VHHBERT is a RoBERTa-based model pre-trained on two million VHH sequences in VHHCorpus-2M. VHHBERT has the same model parameters as RoBERTaBASE, except that it used positional embeddings with a length of 185 to cover the maximum sequence length of 179 in VHHCorpus-2M. Further details on VHHBERT are described in our paper "A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models.”

Usage

The model and tokenizer can be loaded using the transformers library.

from transformers import BertTokenizer, RobertaModel
tokenizer = BertTokenizer.from_pretrained("COGNANO/VHHBERT")
model = RobertaModel.from_pretrained("COGNANO/VHHBERT")

Links

Citation

If you use VHHBERT in your research, please cite the following paper.

@inproceedings{tsuruta2024sars,
  title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models},
  author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura},
  booktitle={Advances in Neural Information Processing Systems 37},
  year={2024}
}
Downloads last month
55
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train COGNANO/VHHBERT