BarcodeBERT for Taxonomic Classification
A pre-trained transformer model for inference on insect DNA barcoding data.
To use BarcodeBERT as a feature extractor:
from transformers import AutoTokenizer, AutoModel
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("bioscan-ml/BarcodeBERT", trust_remote_code=True)
#Load the model
model = AutoModel.from_pretrained("bioscan-ml/BarcodeBERT", trust_remote_code=True)
# Sample sequence
dna_seq = 'ACGCGCTGACGCATCAGCATACGA'
# Tokenize
input_seq = tokenizer(dna_seq, return_tensors = 'pt')['input_ids']
# Pass through the model
output = model(input_seq)['hidden_states'][-1]
# Compute Global Average Pooling
features = output.mean(1)
Citation
If you find BarcodeBERT useful in your research please consider citing:
@misc{arias2023barcodebert,
title={{BarcodeBERT}: Transformers for Biodiversity Analysis},
author={Pablo Millan Arias
and Niousha Sadjadi
and Monireh Safari
and ZeMing Gong
and Austin T. Wang
and Scott C. Lowe
and Joakim Bruslund Haurum
and Iuliia Zarubiieva
and Dirk Steinke
and Lila Kari
and Angel X. Chang
and Graham W. Taylor
},
year={2023},
eprint={2311.02401},
archivePrefix={arXiv},
primaryClass={cs.LG},
doi={10.48550/arxiv.2311.02401},
}
- Downloads last month
- 171