neuralbioinfo
/

prokbert-mini

@@ -108,15 +108,41 @@ except ImportError:
 - **Masked Language Modeling (MLM):** The MLM objective was modified for genomic sequences for masking overlapping k-mers.
 - **Training Phases:** The model underwent initial training with complete sequence restoration and selective masking, followed by a succeeding phase with variable-length datasets for increased complexity.
-### Evaluation Results
-| Metric                  | Result       | Notes |
-|-------------------------|--------------|-------|
-| Metric 1 (e.g., Accuracy) | To be filled |       |
-| Metric 2 (e.g., Precision) | To be filled |       |
-| Metric 3 (e.g., Recall)   | To be filled |       |
-*Additional details and metrics can be included as they become available.*
 ### Ethical Considerations and Limitations

 - **Masked Language Modeling (MLM):** The MLM objective was modified for genomic sequences for masking overlapping k-mers.
 - **Training Phases:** The model underwent initial training with complete sequence restoration and selective masking, followed by a succeeding phase with variable-length datasets for increased complexity.
+### Evaluation Results for ProkBERT-mini
+| Model             | L    | Avg. Ref. Rank | Avg. Top1 | Avg. Top3 | Avg. AUC  |
+|-------------------|------|----------------|-----------|-----------|-----------|
+| ProkBERT-mini     | 128  | 0.9315         | 0.4497    | 0.8960    | 0.9998    |
+| ProkBERT-mini     | 256  | 0.8433         | 0.4848    | 0.9130    | 0.9998    |
+| ProkBERT-mini     | 512  | 0.8098         | 0.5056    | 0.9179    | 0.9998    |
+| ProkBERT-mini     | 1024 | 0.7825         | 0.5169    | 0.9227    | 0.9998    |
+*Masking performance of the ProkBERT family.*
+### Evaluation of Promoter Prediction Tools on E-coli Sigma70 Dataset
+| Tool                  | Accuracy | MCC   | Sensitivity | Specificity |
+|-----------------------|----------|-------|-------------|-------------|
+| ProkBERT-mini         | **0.87** | **0.74** | 0.90        | 0.85        |
+| ProkBERT-mini-c       | **0.87** | 0.73  | 0.88        | 0.85        |
+| ProkBERT-mini-long    | **0.87** | **0.74** | 0.89        | 0.85        |
+| CNNProm               | 0.72     | 0.50  | 0.95        | 0.51        |
+| iPro70-FMWin          | 0.76     | 0.53  | 0.84        | 0.69        |
+| 70ProPred             | 0.74     | 0.51  | 0.90        | 0.60        |
+| iPromoter-2L          | 0.64     | 0.37  | 0.94        | 0.37        |
+| Multiply              | 0.50     | 0.05  | 0.81        | 0.23        |
+| bTSSfinder            | 0.46     | -0.07 | 0.48        | 0.45        |
+| BPROM                 | 0.56     | 0.10  | 0.20        | 0.87        |
+| IBPP                  | 0.50     | -0.03 | 0.26        | 0.71        |
+| Promotech             | 0.71     | 0.43  | 0.49        | **0.90**    |
+| Sigma70Pred           | 0.66     | 0.42  | 0.95        | 0.41        |
+| iPromoter-BnCNN       | 0.55     | 0.27  | **0.99**    | 0.18        |
+| MULTiPly              | 0.54     | 0.19  | 0.92        | 0.22        |
+*The ProkBERT family models exhibit remarkably consistent performance across the metrics assessed. With respect to accuracy, all three tools achieve an impressive score of 0.87, marking them among the top performers in promoter prediction. This suggests that, regardless of the specific version, the underlying methodology used in the mini series is robust and effective.*
 ### Ethical Considerations and Limitations