Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,32 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
# Neural Bioinformatics Research Group - ProkBERT Models
|
11 |
+
|
12 |
+
Welcome to the official Hugging Face organization for the Neural Bioinformatics Research Group. Our main goal is to provide genomic language models for microbiome applications.
|
13 |
+
|
14 |
+
## Models
|
15 |
+
|
16 |
+
We provide access to a collection of pretrained and fine-tuned models from the ProkBERT family. These models are built on the Local Context Aware (LCA) tokenization, specifically tailored for DNA sequences to balance context size and performance.
|
17 |
+
|
18 |
+
ProkBERT models are designed for microbiome-related tasks, such as prokaryote promoter identification or phage detection. Despite their compact size, they are powerful and efficient.
|
19 |
+
|
20 |
+
## Model Overview
|
21 |
+
|
22 |
+
| Model | Parameters | Tokenizer | Layers | Attention Heads | Max. Context Size | Training Data |
|
23 |
+
|---------------|------------|------------------|--------|-----------------|-------------------|---------------------|
|
24 |
+
| `mini` | 20.6M | 6-mer, shift=1 | 6 | 6 | 1027 nt | 206.65 billion |
|
25 |
+
| `mini-c` | 24.9M | 1-mer | 6 | 6 | 1022 nt | 206.65 billion |
|
26 |
+
| `mini-long` | 26.6M | 6-mer, shift=2 | 6 | 6 | 4096 nt | 206.65 billion |
|
27 |
+
|
28 |
+
_A comprehensive overview of model parameters across varied configurations._
|
29 |
+
|
30 |
+
## Resources
|
31 |
+
|
32 |
+
- [Read our paper](https://www.frontiersin.org/articles/10.3389/fmicb.2023.1331233/full)
|
33 |
+
- [Learn more about the model](https://github.com/nbrg-ppcu/prokbert)
|
34 |
+
- [Get started with code on GitHub](https://github.com/nbrg-ppcu/prokbert)
|
35 |
+
|
36 |
+
---
|
37 |
+
|
38 |
+
For more information or questions, please visit our [GitHub repository](https://github.com/nbrg-ppcu/prokbert) or contact us at [email]([email protected]).
|