--- title: README emoji: 📉 colorFrom: pink colorTo: blue sdk: static pinned: false --- # Neural Bioinformatics Research Group - ProkBERT Models Welcome to the official Hugging Face organization for the Neural Bioinformatics Research Group. Our main goal is to provide genomic language models for microbiome applications. ## Models We provide access to a collection of pretrained and fine-tuned models from the ProkBERT family. These models are built on the Local Context Aware (LCA) tokenization, specifically tailored for DNA sequences to balance context size and performance. ProkBERT models are designed for microbiome-related tasks, such as prokaryote promoter identification or phage detection. Despite their compact size, they are powerful and efficient. ## Model Overview | Model | Parameters | Tokenizer | Layers | Attention Heads | Max. Context Size | Training Data | |---------------|------------|------------------|--------|-----------------|-------------------|---------------------| | `mini` | 20.6M | 6-mer, shift=1 | 6 | 6 | 1027 nt | 206.65 billion | | `mini-c` | 24.9M | 1-mer | 6 | 6 | 1022 nt | 206.65 billion | | `mini-long` | 26.6M | 6-mer, shift=2 | 6 | 6 | 4096 nt | 206.65 billion | _A comprehensive overview of model parameters across varied configurations._ ## Resources - [Read our paper](https://www.frontiersin.org/articles/10.3389/fmicb.2023.1331233/full) - [Learn more about the model](https://github.com/nbrg-ppcu/prokbert) - [Get started with code on GitHub](https://github.com/nbrg-ppcu/prokbert/tree/main?tab=readme-ov-file#tutorials-and-examples) --- For more information or questions, please visit our [GitHub repository](https://github.com/nbrg-ppcu/prokbert) or contact us at [email](obalasz@gmail.com).