|
--- |
|
library_name: transformers |
|
tags: |
|
- bulk RNA-seq |
|
- biology |
|
- transcriptomics |
|
--- |
|
|
|
# BulkRNABert |
|
|
|
BulkRNABert is a transformer-based, encoder-only language model pre-trained on bulk RNA-seq data using self-supervision via masked language modeling, following BERT’s method. It can be further fine-tuned for cancer type classification and survival time prediction on the TCGA dataset. |
|
|
|
**Developed by:** [InstaDeep](https://huggingface.co/InstaDeepAI) |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- [**Repository**](https://github.com/instadeepai/multiomics-open-research) |
|
- **Paper:** [BulkRNABert: Cancer prognosis from bulk RNA-seq based language models](https://proceedings.mlr.press/v259/gelard25a.html) |
|
|
|
### How to use |
|
|
|
Until its next release, the transformers library needs to be installed from source using the following command to use the models. |
|
PyTorch should also be installed. |
|
|
|
``` |
|
pip install --upgrade git+https://github.com/huggingface/transformers.git |
|
pip install torch |
|
``` |
|
|
|
A small snippet of code is provided below to run inference with the model using random input. |
|
|
|
``` |
|
import torch |
|
from transformers import AutoConfig, AutoModel |
|
|
|
model = AutoModel.from_pretrained( |
|
"InstaDeepAI/BulkRNABert", |
|
trust_remote_code=True, |
|
) |
|
|
|
n_genes = model.config.n_genes |
|
dummy_gene_expressions = torch.randint(0, model.config.n_expressions_bins, (1, n_genes)) |
|
torch_output = model(dummy_gene_expressions) |
|
``` |
|
|
|
A more complete example is provided in the repository. |
|
|
|
|
|
### Citing our work |
|
|
|
``` |
|
@InProceedings{pmlr-v259-gelard25a, |
|
title = {BulkRNABert: Cancer prognosis from bulk RNA-seq based language models}, |
|
author = {G{\'{e}}lard, Maxence and Richard, Guillaume and Pierrot, Thomas and Courn{\`{e}}de, Paul-Henry}, |
|
booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, |
|
pages = {384--400}, |
|
year = {2025}, |
|
editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, |
|
volume = {259}, |
|
series = {Proceedings of Machine Learning Research}, |
|
month = {15--16 Dec}, |
|
publisher = {PMLR}, |
|
url = {https://proceedings.mlr.press/v259/gelard25a.html}, |
|
} |
|
``` |
|
|