Model metrics

Model testing was performed in the held-out test set of the dataset. The Dice similarity index (Dice) and the normalized surface distance (NSD) were calculated for each label individually, and 95% confidence were computed using bootstrap resampling with 1000 iterations.

Class ID	Class Description	Dice	NSD
0	background	1.0 [1.0 - 1.0]	0.999 [0.999 - 1.0]
1	T1	0.946 [0.928 - 0.958]	0.979 [0.961 - 0.99]
2	T2	0.954 [0.94 - 0.965]	0.993 [0.985 - 0.998]
3	T3	0.956 [0.939 - 0.969]	0.989 [0.976 - 0.998]
4	T4	0.946 [0.917 - 0.968]	0.979 [0.956 - 0.996]
5	T5	0.949 [0.923 - 0.968]	0.981 [0.961 - 0.997]
6	T6	0.947 [0.919 - 0.969]	0.978 [0.955 - 0.997]
7	T7	0.94 [0.908 - 0.966]	0.97 [0.941 - 0.992]
8	T8	0.941 [0.912 - 0.966]	0.969 [0.944 - 0.991]
9	T9	0.934 [0.903 - 0.959]	0.962 [0.937 - 0.985]
10	T10	0.933 [0.906 - 0.959]	0.963 [0.94 - 0.985]
11	T11	0.927 [0.897 - 0.955]	0.951 [0.923 - 0.978]
12	T12	0.931 [0.9 - 0.958]	0.955 [0.926 - 0.981]
13	L1	0.938 [0.907 - 0.963]	0.959 [0.928 - 0.984]
14	L2	0.962 [0.943 - 0.978]	0.982 [0.963 - 0.997]
15	L3	0.962 [0.94 - 0.978]	0.981 [0.957 - 0.996]
16	L4	0.952 [0.923 - 0.971]	0.968 [0.939 - 0.988]
17	L5	0.936 [0.91 - 0.955]	0.958 [0.932 - 0.976]
18	L6	0.0 [0.0 - 0.0]	0.0 [0.0 - 0.0]
19	Sacrum	0.958 [0.951 - 0.965]	0.983 [0.975 - 0.988]
20	Os coccygis	NA	NA
21	T13	0.0 [0.0 - 0.0]	0.0 [0.0 - 0.0]