InstaDeepAI
/

BulkRNABert

Feature Extraction

transcriptomics

Model card Files Files and versions Community

BulkRNABert / README.md

mgelard's picture

Upload tokenizer

a4fac02 verified about 1 month ago

|

2.28 kB

	---
	library_name: transformers
	tags:
	- bulk RNA-seq
	- biology
	- transcriptomics
	---

	# BulkRNABert

	BulkRNABert is a transformer-based, encoder-only language model pre-trained on bulk RNA-seq data using self-supervision via masked language modeling, following BERT’s method. It can be further fine-tuned for cancer type classification and survival time prediction on the TCGA dataset.

	Developed by: [InstaDeep](https://huggingface.co/InstaDeepAI)

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- [Repository](https://github.com/instadeepai/multiomics-open-research)
	- Paper: [BulkRNABert: Cancer prognosis from bulk RNA-seq based language models](https://proceedings.mlr.press/v259/gelard25a.html)

	### How to use

	Until its next release, the transformers library needs to be installed from source using the following command to use the models.
	PyTorch should also be installed.

	```
	pip install --upgrade git+https://github.com/huggingface/transformers.git
	pip install torch
	```

	A small snippet of code is provided below to run inference with the model using random input.

	```
	import torch
	from transformers import AutoConfig, AutoModel

	model = AutoModel.from_pretrained(
	"InstaDeepAI/BulkRNABert",
	trust_remote_code=True,
	)

	n_genes = model.config.n_genes
	dummy_gene_expressions = torch.randint(0, model.config.n_expressions_bins, (1, n_genes))
	torch_output = model(dummy_gene_expressions)
	```

	A more complete example is provided in the repository.


	### Citing our work

	```
	@InProceedings{pmlr-v259-gelard25a,
	title = {BulkRNABert: Cancer prognosis from bulk RNA-seq based language models},
	author = {G{\'{e}}lard, Maxence and Richard, Guillaume and Pierrot, Thomas and Courn{\`{e}}de, Paul-Henry},
	booktitle = {Proceedings of the 4th Machine Learning for Health Symposium},
	pages = {384--400},
	year = {2025},
	editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran},
	volume = {259},
	series = {Proceedings of Machine Learning Research},
	month = {15--16 Dec},
	publisher = {PMLR},
	url = {https://proceedings.mlr.press/v259/gelard25a.html},
	}
	```