Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
1 |
## ProkBERT-mini Model
|
2 |
|
3 |
ProkBERT-mini-k6s1 is part of the ProkBERT family of genomic language models, specifically designed for microbiome applications. This model, optimized for DNA sequence analysis. This model can provide robust and high resolution solutions.
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
### Model Details
|
6 |
|
7 |
**Developed by:** Neural Bioinformatics Research Group
|
@@ -109,6 +145,7 @@ try:
|
|
109 |
except ImportError:
|
110 |
!pip install prokbert
|
111 |
print("Installed ProkBERT.")
|
|
|
112 |
|
113 |
### Training Data and Process
|
114 |
|
@@ -139,5 +176,18 @@ Please report any issues with the model or its outputs to the Neural Bioinformat
|
|
139 |
- **Model issues:** [GitHub repository link](https://github.com/nbrg-ppcu/prokbert)
|
140 |
- **Feedback and inquiries:** [[email protected]](mailto:[email protected])
|
141 |
|
142 |
-
|
143 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
---
|
4 |
+
|
5 |
+
|
6 |
## ProkBERT-mini Model
|
7 |
|
8 |
ProkBERT-mini-k6s1 is part of the ProkBERT family of genomic language models, specifically designed for microbiome applications. This model, optimized for DNA sequence analysis. This model can provide robust and high resolution solutions.
|
9 |
|
10 |
+
## Simple Usage Example
|
11 |
+
|
12 |
+
The following example demonstrates how to use the ProkBERT-mini model for processing a DNA sequence:
|
13 |
+
|
14 |
+
```python
|
15 |
+
from transformers import MegatronBertForMaskedLM
|
16 |
+
from prokbert.prokbert_tokenizer import ProkBERTTokenizer
|
17 |
+
|
18 |
+
# Tokenization parameters
|
19 |
+
tokenization_parameters = {
|
20 |
+
'kmer': 6,
|
21 |
+
'shift': 1
|
22 |
+
}
|
23 |
+
|
24 |
+
# Initialize the tokenizer and model
|
25 |
+
tokenizer = ProkBERTTokenizer(tokenization_params=tokenization_parameters, operation_space='sequence')
|
26 |
+
model = MegatronBertForMaskedLM.from_pretrained("nerualbioinfo/prokbert-mini-k6s2")
|
27 |
+
|
28 |
+
# Example DNA sequence
|
29 |
+
sequence = 'ATGTCCGCGGGACCT'
|
30 |
+
|
31 |
+
# Tokenize the sequence
|
32 |
+
inputs = tokenizer(sequence, return_tensors="pt")
|
33 |
+
|
34 |
+
# Ensure that inputs have a batch dimension
|
35 |
+
inputs = {key: value.unsqueeze(0) for key, value in inputs.items()}
|
36 |
+
|
37 |
+
# Generate outputs from the model
|
38 |
+
outputs = model(**inputs)
|
39 |
+
```
|
40 |
+
|
41 |
### Model Details
|
42 |
|
43 |
**Developed by:** Neural Bioinformatics Research Group
|
|
|
145 |
except ImportError:
|
146 |
!pip install prokbert
|
147 |
print("Installed ProkBERT.")
|
148 |
+
```
|
149 |
|
150 |
### Training Data and Process
|
151 |
|
|
|
176 |
- **Model issues:** [GitHub repository link](https://github.com/nbrg-ppcu/prokbert)
|
177 |
- **Feedback and inquiries:** [[email protected]](mailto:[email protected])
|
178 |
|
179 |
+
## Reference
|
180 |
+
|
181 |
+
If you use ProkBERT-mini in your research, please cite the following paper:
|
182 |
+
@ARTICLE{10.3389/fmicb.2023.1331233,
|
183 |
+
AUTHOR={Ligeti, Balázs and Szepesi-Nagy, István and Bodnár, Babett and Ligeti-Nagy, Noémi and Juhász, János},
|
184 |
+
TITLE={ProkBERT family: genomic language models for microbiome applications},
|
185 |
+
JOURNAL={Frontiers in Microbiology},
|
186 |
+
VOLUME={14},
|
187 |
+
YEAR={2024},
|
188 |
+
URL={https://www.frontiersin.org/articles/10.3389/fmicb.2023.1331233},
|
189 |
+
DOI={10.3389/fmicb.2023.1331233},
|
190 |
+
ISSN={1664-302X},
|
191 |
+
ABSTRACT={...}
|
192 |
+
}
|
193 |
+
|