Model metrics
Model testing was performed in the held-out test set of the dataset. The Dice similarity index (Dice) and the normalized surface distance (NSD) were calculated for each label individually, and 95% confidence were computed using bootstrap resampling with 1000 iterations.
Class ID | Class Description | Dice | NSD |
---|---|---|---|
0 | background | 1.0 [1.0 - 1.0] | 0.999 [0.999 - 1.0] |
1 | T1 | 0.946 [0.928 - 0.958] | 0.979 [0.961 - 0.99] |
2 | T2 | 0.954 [0.94 - 0.965] | 0.993 [0.985 - 0.998] |
3 | T3 | 0.956 [0.939 - 0.969] | 0.989 [0.976 - 0.998] |
4 | T4 | 0.946 [0.917 - 0.968] | 0.979 [0.956 - 0.996] |
5 | T5 | 0.949 [0.923 - 0.968] | 0.981 [0.961 - 0.997] |
6 | T6 | 0.947 [0.919 - 0.969] | 0.978 [0.955 - 0.997] |
7 | T7 | 0.94 [0.908 - 0.966] | 0.97 [0.941 - 0.992] |
8 | T8 | 0.941 [0.912 - 0.966] | 0.969 [0.944 - 0.991] |
9 | T9 | 0.934 [0.903 - 0.959] | 0.962 [0.937 - 0.985] |
10 | T10 | 0.933 [0.906 - 0.959] | 0.963 [0.94 - 0.985] |
11 | T11 | 0.927 [0.897 - 0.955] | 0.951 [0.923 - 0.978] |
12 | T12 | 0.931 [0.9 - 0.958] | 0.955 [0.926 - 0.981] |
13 | L1 | 0.938 [0.907 - 0.963] | 0.959 [0.928 - 0.984] |
14 | L2 | 0.962 [0.943 - 0.978] | 0.982 [0.963 - 0.997] |
15 | L3 | 0.962 [0.94 - 0.978] | 0.981 [0.957 - 0.996] |
16 | L4 | 0.952 [0.923 - 0.971] | 0.968 [0.939 - 0.988] |
17 | L5 | 0.936 [0.91 - 0.955] | 0.958 [0.932 - 0.976] |
18 | L6 | 0.0 [0.0 - 0.0] | 0.0 [0.0 - 0.0] |
19 | Sacrum | 0.958 [0.951 - 0.965] | 0.983 [0.975 - 0.988] |
20 | Os coccygis | NA | NA |
21 | T13 | 0.0 [0.0 - 0.0] | 0.0 [0.0 - 0.0] |