BulkRNABert / README.md
mgelard's picture
Update README.md
cb73a40 verified
|
raw
history blame
2.28 kB
metadata
library_name: transformers
tags:
  - bulk RNA-seq
  - biology
  - transcriptomics

BulkRNABert

BulkRNABert is a transformer-based, encoder-only language model pre-trained on bulk RNA-seq data using self-supervision via masked language modeling, following BERT’s method. It can be further fine-tuned for cancer type classification and survival time prediction on the TCGA dataset.

Developed by: InstaDeep

Model Sources

How to use

Until its next release, the transformers library needs to be installed from source using the following command to use the models. PyTorch should also be installed.

pip install --upgrade git+https://github.com/huggingface/transformers.git
pip install torch

A small snippet of code is provided below to run inference with the model using random input.

import torch
from transformers import AutoConfig, AutoModel

model = AutoModel.from_pretrained(
    "InstaDeepAI/BulkRNABert",
    trust_remote_code=True,
)

n_genes = jax_config.n_genes
dummy_gene_expressions = torch.randint(0, jax_config.n_expressions_bins, (1, n_genes))
torch_output = model(dummy_gene_expressions)

A more complete example is provided in the repository.

Citing our work

@InProceedings{pmlr-v259-gelard25a,
  title = 	 {BulkRNABert: Cancer prognosis from bulk RNA-seq based language models},
  author =       {G{\'{e}}lard, Maxence and Richard, Guillaume and Pierrot, Thomas and Courn{\`{e}}de, Paul-Henry},
  booktitle = 	 {Proceedings of the 4th Machine Learning for Health Symposium},
  pages = 	 {384--400},
  year = 	 {2025},
  editor = 	 {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran},
  volume = 	 {259},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--16 Dec},
  publisher =    {PMLR},
  url = 	 {https://proceedings.mlr.press/v259/gelard25a.html},
}