BulkRNABert / README.md
mgelard's picture
Upload tokenizer
a4fac02 verified
|
raw
history blame
2.28 kB
---
library_name: transformers
tags:
- bulk RNA-seq
- biology
- transcriptomics
---
# BulkRNABert
BulkRNABert is a transformer-based, encoder-only language model pre-trained on bulk RNA-seq data using self-supervision via masked language modeling, following BERT’s method. It can be further fine-tuned for cancer type classification and survival time prediction on the TCGA dataset.
**Developed by:** [InstaDeep](https://huggingface.co/InstaDeepAI)
### Model Sources
<!-- Provide the basic links for the model. -->
- [**Repository**](https://github.com/instadeepai/multiomics-open-research)
- **Paper:** [BulkRNABert: Cancer prognosis from bulk RNA-seq based language models](https://proceedings.mlr.press/v259/gelard25a.html)
### How to use
Until its next release, the transformers library needs to be installed from source using the following command to use the models.
PyTorch should also be installed.
```
pip install --upgrade git+https://github.com/huggingface/transformers.git
pip install torch
```
A small snippet of code is provided below to run inference with the model using random input.
```
import torch
from transformers import AutoConfig, AutoModel
model = AutoModel.from_pretrained(
"InstaDeepAI/BulkRNABert",
trust_remote_code=True,
)
n_genes = model.config.n_genes
dummy_gene_expressions = torch.randint(0, model.config.n_expressions_bins, (1, n_genes))
torch_output = model(dummy_gene_expressions)
```
A more complete example is provided in the repository.
### Citing our work
```
@InProceedings{pmlr-v259-gelard25a,
title = {BulkRNABert: Cancer prognosis from bulk RNA-seq based language models},
author = {G{\'{e}}lard, Maxence and Richard, Guillaume and Pierrot, Thomas and Courn{\`{e}}de, Paul-Henry},
booktitle = {Proceedings of the 4th Machine Learning for Health Symposium},
pages = {384--400},
year = {2025},
editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran},
volume = {259},
series = {Proceedings of Machine Learning Research},
month = {15--16 Dec},
publisher = {PMLR},
url = {https://proceedings.mlr.press/v259/gelard25a.html},
}
```