neuralbioinfo
/

PhaStyle-mini

PyTorch

Safetensors

prokbert

custom_code

Model card Files Files and versions Community

ligeti commited on Oct 16, 2024

Commit

c64cdee

verified ·

1 Parent(s): d8b9306

Update README.md

Browse files

Files changed (1) hide show

README.md +78 -24

README.md CHANGED Viewed

@@ -58,30 +58,84 @@ Each dataset was processed using **512bp segment lengths** to simulate fragmente
 The performance of ProkBERT PhaStyle was evaluated on various datasets, including *Escherichia* and EXTREMOPHILE phages, using segment lengths of 512bp and 1022bp. The results are summarized below:
-### Performance on *Escherichia* Dataset (512bp and 1022bp segments)
-| Method                   | Balanced Accuracy | MCC   | Sensitivity | Specificity |
-|--------------------------|-------------------|-------|-------------|-------------|
-| **ProkBERT-mini (512bp)** | 0.91              | 0.83  | 0.94        | 0.89        |
-| ProkBERT-mini-long (512bp)| 0.90              | 0.82  | 0.96        | 0.85        |
-| ProkBERT-mini-c (512bp)   | 0.89              | 0.80  | 0.95        | 0.84        |
-| DNABERT-2-117M (512bp)    | 0.84              | 0.72  | 0.95        | 0.74        |
-| Nuc. Trans.-50m (512bp)   | 0.85              | 0.72  | 0.92        | 0.78        |
-| **ProkBERT-mini (1022bp)**| **0.94**          | **0.88** | **0.97**    | **0.91**    |
-| ProkBERT-mini-long (1022bp)| 0.94             | 0.89  | 0.97        | 0.91        |
-### Performance on EXTREMOPHILE Dataset (512bp and 1022bp segments)
-| Method                   | Balanced Accuracy | MCC   | Sensitivity | Specificity |
-|--------------------------|-------------------|-------|-------------|-------------|
-| **ProkBERT-mini (512bp)** | 0.93              | 0.83  | 0.99        | 0.87        |
-| ProkBERT-mini-long (512bp)| 0.93              | 0.82  | **1.00**    | 0.86        |
-| ProkBERT-mini-c (512bp)   | 0.92              | 0.80  | 0.99        | 0.84        |
-| DNABERT-2-117M (512bp)    | 0.89              | 0.74  | 0.99        | 0.79        |
-| **ProkBERT-mini (1022bp)**| **0.96**          | **0.91** | **1.00**    | **0.93**    |
-| ProkBERT-mini-long (1022bp)| 0.96             | 0.90  | 1.00        | 0.92        |
-These tables highlight the high accuracy, MCC, and generalization capability of ProkBERT models, particularly on challenging datasets like *Escherichia* and extremophile phages. The ProkBERT-mini and ProkBERT-mini-long models consistently performed well on both datasets.
 For more detailed results, including additional metrics, please refer to the original research paper.
 ---

 The performance of ProkBERT PhaStyle was evaluated on various datasets, including *Escherichia* and EXTREMOPHILE phages, using segment lengths of 512bp and 1022bp. The results are summarized below:
+### Performance on Escherichia Test Set (512bp segments)
+| Method                | Balanced Accuracy | MCC   | Sensitivity | Specificity |
+|-----------------------|-------------------|-------|-------------|-------------|
+| ProkBERT-mini          | 0.91              | 0.83  | 0.94        | 0.89        |
+| ProkBERT-mini-long     | 0.90              | 0.82  | 0.96        | 0.85        |
+| ProkBERT-mini-c        | 0.89              | 0.80  | 0.95        | 0.84        |
+| DNABERT-2-117M         | 0.84              | 0.72  | 0.95        | 0.74        |
+| Nucleotide Transformer-50m | 0.85          | 0.72  | 0.92        | 0.78        |
+| Nucleotide Transformer-100m | 0.87        | 0.75  | 0.93        | 0.82        |
+| Nucleotide Transformer-500m | 0.88        | 0.78  | 0.96        | 0.80        |
+| DeePhage               | 0.86              | 0.71  | 0.84        | 0.88        |
+| PhaTYP                 | 0.91              | 0.83  | 0.94        | 0.88        |
+### Performance on Escherichia Test Set (1022bp segments)
+| Method                | Balanced Accuracy | MCC   | Sensitivity | Specificity |
+|-----------------------|-------------------|-------|-------------|-------------|
+| ProkBERT-mini          | 0.94              | 0.88  | 0.97        | 0.91        |
+| ProkBERT-mini-long     | 0.94              | 0.89  | 0.97        | 0.91        |
+| ProkBERT-mini-c        | 0.93              | 0.87  | 0.97        | 0.89        |
+| DNABERT-2-117M         | 0.90              | 0.80  | 0.95        | 0.85        |
+| Nucleotide Transformer-50m | 0.90          | 0.80  | 0.94        | 0.85        |
+| Nucleotide Transformer-100m | 0.92        | 0.83  | 0.94        | 0.89        |
+| Nucleotide Transformer-500m | 0.91        | 0.84  | 0.96        | 0.87        |
+| DeePhage               | 0.91              | 0.82  | 0.94        | 0.88        |
+| PhaTYP                 | 0.92              | 0.84  | 0.96        | 0.87        |
+---
+### Performance on EXTREMOPHILE Test Set (512bp segments)
+| Method                | Balanced Accuracy | MCC   | Sensitivity | Specificity |
+|-----------------------|-------------------|-------|-------------|-------------|
+| ProkBERT-mini          | 0.93              | 0.83  | 0.99        | 0.87        |
+| ProkBERT-mini-long     | 0.93              | 0.82  | 1.00        | 0.86        |
+| ProkBERT-mini-c        | 0.92              | 0.80  | 0.99        | 0.84        |
+| DNABERT-2-117M         | 0.89              | 0.74  | 0.99        | 0.79        |
+| Nucleotide Transformer-50m | 0.91          | 0.79  | 0.98        | 0.84        |
+| Nucleotide Transformer-100m | 0.90        | 0.76  | 0.97        | 0.82        |
+| Nucleotide Transformer-500m | 0.91        | 0.78  | 0.99        | 0.82        |
+| DeePhage               | 0.87              | 0.75  | 0.84        | 0.91        |
+| PhaTYP                 | 0.76              | 0.52  | 0.74        | 0.79        |
+### Performance on EXTREMOPHILE Test Set (1022bp segments)
+| Method                | Balanced Accuracy | MCC   | Sensitivity | Specificity |
+|-----------------------|-------------------|-------|-------------|-------------|
+| ProkBERT-mini          | 0.96              | 0.91  | 1.00        | 0.93        |
+| ProkBERT-mini-long     | 0.96              | 0.90  | 1.00        | 0.92        |
+| ProkBERT-mini-c        | 0.94              | 0.86  | 1.00        | 0.89        |
+| DNABERT-2-117M         | 0.94              | 0.85  | 0.98        | 0.90        |
+| Nucleotide Transformer-50m | 0.93          | 0.83  | 0.99        | 0.87        |
+| Nucleotide Transformer-100m | 0.95        | 0.88  | 0.98        | 0.91        |
+| Nucleotide Transformer-500m | 0.96        | 0.89  | 1.00        | 0.91        |
+| DeePhage               | 0.92              | 0.80  | 0.96        | 0.87        |
+| PhaTYP                 | 0.80              | 0.58  | 0.84        | 0.76        |
+---
+### Inference Speed and Running Times
+| Model                  | Execution Time (seconds) | Inference Speed (MB/sec) |
+|------------------------|--------------------------|--------------------------|
+| ProkBERT-mini-long      | 132                      | 0.52                     |
+| ProkBERT-mini           | 141                      | 0.49                     |
+| ProkBERT-mini-c         | 146                      | 0.47                     |
+| DNABERT-2-117M          | 284                      | 0.23                     |
+| Nucleotide Transformer-50m | 292                   | 0.21                     |
+| Nucleotide Transformer-100m | 313                  | 0.20                     |
+| Nucleotide Transformer-500m | 500                  | 0.15                     |
+| DeePhage                | 159                      | 0.43                     |
+| PhaTYP                  | 2718                     | 0.10                     |
+| BACPHLIP                | 7125                     | 0.04                     |
 For more detailed results, including additional metrics, please refer to the original research paper.
 ---