PyTorch
megatron-bert
ligeti commited on
Commit
c64cdee
1 Parent(s): d8b9306

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -24
README.md CHANGED
@@ -58,30 +58,84 @@ Each dataset was processed using **512bp segment lengths** to simulate fragmente
58
 
59
  The performance of ProkBERT PhaStyle was evaluated on various datasets, including *Escherichia* and EXTREMOPHILE phages, using segment lengths of 512bp and 1022bp. The results are summarized below:
60
 
61
- ### Performance on *Escherichia* Dataset (512bp and 1022bp segments)
62
-
63
- | Method | Balanced Accuracy | MCC | Sensitivity | Specificity |
64
- |--------------------------|-------------------|-------|-------------|-------------|
65
- | **ProkBERT-mini (512bp)** | 0.91 | 0.83 | 0.94 | 0.89 |
66
- | ProkBERT-mini-long (512bp)| 0.90 | 0.82 | 0.96 | 0.85 |
67
- | ProkBERT-mini-c (512bp) | 0.89 | 0.80 | 0.95 | 0.84 |
68
- | DNABERT-2-117M (512bp) | 0.84 | 0.72 | 0.95 | 0.74 |
69
- | Nuc. Trans.-50m (512bp) | 0.85 | 0.72 | 0.92 | 0.78 |
70
- | **ProkBERT-mini (1022bp)**| **0.94** | **0.88** | **0.97** | **0.91** |
71
- | ProkBERT-mini-long (1022bp)| 0.94 | 0.89 | 0.97 | 0.91 |
72
-
73
- ### Performance on EXTREMOPHILE Dataset (512bp and 1022bp segments)
74
-
75
- | Method | Balanced Accuracy | MCC | Sensitivity | Specificity |
76
- |--------------------------|-------------------|-------|-------------|-------------|
77
- | **ProkBERT-mini (512bp)** | 0.93 | 0.83 | 0.99 | 0.87 |
78
- | ProkBERT-mini-long (512bp)| 0.93 | 0.82 | **1.00** | 0.86 |
79
- | ProkBERT-mini-c (512bp) | 0.92 | 0.80 | 0.99 | 0.84 |
80
- | DNABERT-2-117M (512bp) | 0.89 | 0.74 | 0.99 | 0.79 |
81
- | **ProkBERT-mini (1022bp)**| **0.96** | **0.91** | **1.00** | **0.93** |
82
- | ProkBERT-mini-long (1022bp)| 0.96 | 0.90 | 1.00 | 0.92 |
83
-
84
- These tables highlight the high accuracy, MCC, and generalization capability of ProkBERT models, particularly on challenging datasets like *Escherichia* and extremophile phages. The ProkBERT-mini and ProkBERT-mini-long models consistently performed well on both datasets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  For more detailed results, including additional metrics, please refer to the original research paper.
87
  ---
 
58
 
59
  The performance of ProkBERT PhaStyle was evaluated on various datasets, including *Escherichia* and EXTREMOPHILE phages, using segment lengths of 512bp and 1022bp. The results are summarized below:
60
 
61
+
62
+ ### Performance on Escherichia Test Set (512bp segments)
63
+
64
+ | Method | Balanced Accuracy | MCC | Sensitivity | Specificity |
65
+ |-----------------------|-------------------|-------|-------------|-------------|
66
+ | ProkBERT-mini | 0.91 | 0.83 | 0.94 | 0.89 |
67
+ | ProkBERT-mini-long | 0.90 | 0.82 | 0.96 | 0.85 |
68
+ | ProkBERT-mini-c | 0.89 | 0.80 | 0.95 | 0.84 |
69
+ | DNABERT-2-117M | 0.84 | 0.72 | 0.95 | 0.74 |
70
+ | Nucleotide Transformer-50m | 0.85 | 0.72 | 0.92 | 0.78 |
71
+ | Nucleotide Transformer-100m | 0.87 | 0.75 | 0.93 | 0.82 |
72
+ | Nucleotide Transformer-500m | 0.88 | 0.78 | 0.96 | 0.80 |
73
+ | DeePhage | 0.86 | 0.71 | 0.84 | 0.88 |
74
+ | PhaTYP | 0.91 | 0.83 | 0.94 | 0.88 |
75
+
76
+ ### Performance on Escherichia Test Set (1022bp segments)
77
+
78
+ | Method | Balanced Accuracy | MCC | Sensitivity | Specificity |
79
+ |-----------------------|-------------------|-------|-------------|-------------|
80
+ | ProkBERT-mini | 0.94 | 0.88 | 0.97 | 0.91 |
81
+ | ProkBERT-mini-long | 0.94 | 0.89 | 0.97 | 0.91 |
82
+ | ProkBERT-mini-c | 0.93 | 0.87 | 0.97 | 0.89 |
83
+ | DNABERT-2-117M | 0.90 | 0.80 | 0.95 | 0.85 |
84
+ | Nucleotide Transformer-50m | 0.90 | 0.80 | 0.94 | 0.85 |
85
+ | Nucleotide Transformer-100m | 0.92 | 0.83 | 0.94 | 0.89 |
86
+ | Nucleotide Transformer-500m | 0.91 | 0.84 | 0.96 | 0.87 |
87
+ | DeePhage | 0.91 | 0.82 | 0.94 | 0.88 |
88
+ | PhaTYP | 0.92 | 0.84 | 0.96 | 0.87 |
89
+
90
+ ---
91
+
92
+ ### Performance on EXTREMOPHILE Test Set (512bp segments)
93
+
94
+ | Method | Balanced Accuracy | MCC | Sensitivity | Specificity |
95
+ |-----------------------|-------------------|-------|-------------|-------------|
96
+ | ProkBERT-mini | 0.93 | 0.83 | 0.99 | 0.87 |
97
+ | ProkBERT-mini-long | 0.93 | 0.82 | 1.00 | 0.86 |
98
+ | ProkBERT-mini-c | 0.92 | 0.80 | 0.99 | 0.84 |
99
+ | DNABERT-2-117M | 0.89 | 0.74 | 0.99 | 0.79 |
100
+ | Nucleotide Transformer-50m | 0.91 | 0.79 | 0.98 | 0.84 |
101
+ | Nucleotide Transformer-100m | 0.90 | 0.76 | 0.97 | 0.82 |
102
+ | Nucleotide Transformer-500m | 0.91 | 0.78 | 0.99 | 0.82 |
103
+ | DeePhage | 0.87 | 0.75 | 0.84 | 0.91 |
104
+ | PhaTYP | 0.76 | 0.52 | 0.74 | 0.79 |
105
+
106
+ ### Performance on EXTREMOPHILE Test Set (1022bp segments)
107
+
108
+ | Method | Balanced Accuracy | MCC | Sensitivity | Specificity |
109
+ |-----------------------|-------------------|-------|-------------|-------------|
110
+ | ProkBERT-mini | 0.96 | 0.91 | 1.00 | 0.93 |
111
+ | ProkBERT-mini-long | 0.96 | 0.90 | 1.00 | 0.92 |
112
+ | ProkBERT-mini-c | 0.94 | 0.86 | 1.00 | 0.89 |
113
+ | DNABERT-2-117M | 0.94 | 0.85 | 0.98 | 0.90 |
114
+ | Nucleotide Transformer-50m | 0.93 | 0.83 | 0.99 | 0.87 |
115
+ | Nucleotide Transformer-100m | 0.95 | 0.88 | 0.98 | 0.91 |
116
+ | Nucleotide Transformer-500m | 0.96 | 0.89 | 1.00 | 0.91 |
117
+ | DeePhage | 0.92 | 0.80 | 0.96 | 0.87 |
118
+ | PhaTYP | 0.80 | 0.58 | 0.84 | 0.76 |
119
+
120
+ ---
121
+
122
+ ### Inference Speed and Running Times
123
+
124
+ | Model | Execution Time (seconds) | Inference Speed (MB/sec) |
125
+ |------------------------|--------------------------|--------------------------|
126
+ | ProkBERT-mini-long | 132 | 0.52 |
127
+ | ProkBERT-mini | 141 | 0.49 |
128
+ | ProkBERT-mini-c | 146 | 0.47 |
129
+ | DNABERT-2-117M | 284 | 0.23 |
130
+ | Nucleotide Transformer-50m | 292 | 0.21 |
131
+ | Nucleotide Transformer-100m | 313 | 0.20 |
132
+ | Nucleotide Transformer-500m | 500 | 0.15 |
133
+ | DeePhage | 159 | 0.43 |
134
+ | PhaTYP | 2718 | 0.10 |
135
+ | BACPHLIP | 7125 | 0.04 |
136
+
137
+
138
+
139
 
140
  For more detailed results, including additional metrics, please refer to the original research paper.
141
  ---