Neural Bioinformatics Research Group
AI & ML interests
None defined yet.
Neural Bioinformatics Research Group - ProkBERT Models
Welcome to the official Hugging Face organization for the Neural Bioinformatics Research Group. Our main goal is to provide genomic language models for microbiome applications.
Models
We provide access to a collection of pretrained and fine-tuned models from the ProkBERT family. These models are built on the Local Context Aware (LCA) tokenization, specifically tailored for DNA sequences to balance context size and performance.
ProkBERT models are designed for microbiome-related tasks, such as prokaryote promoter identification or phage detection. Despite their compact size, they are powerful and efficient.
Model Overview
Model | Parameters | Tokenizer | Layers | Attention Heads | Max. Context Size | Training Data |
---|---|---|---|---|---|---|
mini |
20.6M | 6-mer, shift=1 | 6 | 6 | 1027 nt | 206.65 billion |
mini-c |
24.9M | 1-mer | 6 | 6 | 1022 nt | 206.65 billion |
mini-long |
26.6M | 6-mer, shift=2 | 6 | 6 | 4096 nt | 206.65 billion |
A comprehensive overview of model parameters across varied configurations.
Resources
For more information or questions, please visit our GitHub repository or contact us at email.