Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.compbiomed.2021.104497
DC FieldValue
dc.titleComparison of metrics for the evaluation of medical segmentations using prostate MRI dataset
dc.contributor.authorNai, Ying-Hwey
dc.contributor.authorTeo, Bernice W.
dc.contributor.authorTan, Nadya L.
dc.contributor.authorO'Doherty, Sophie
dc.contributor.authorStephenson, Mary C.
dc.contributor.authorThian, Yee Liang
dc.contributor.authorChiong, Edmund
dc.contributor.authorReilhac, Anthonin
dc.date.accessioned2022-10-13T01:18:57Z
dc.date.available2022-10-13T01:18:57Z
dc.date.issued2021-07-01
dc.identifier.citationNai, Ying-Hwey, Teo, Bernice W., Tan, Nadya L., O'Doherty, Sophie, Stephenson, Mary C., Thian, Yee Liang, Chiong, Edmund, Reilhac, Anthonin (2021-07-01). Comparison of metrics for the evaluation of medical segmentations using prostate MRI dataset. Computers in Biology and Medicine 134 : 104497. ScholarBank@NUS Repository. https://doi.org/10.1016/j.compbiomed.2021.104497
dc.identifier.issn0010-4825
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/232919
dc.description.abstractNine previously proposed segmentation evaluation metrics, targeting medical relevance, accounting for holes, and added regions or differentiating over- and under-segmentation, were compared with 24 traditional metrics to identify those which better capture the requirements for clinical segmentation evaluation. Evaluation was first performed using 2D synthetic shapes to highlight features and pitfalls of the metrics with known ground truths (GTs) and machine segmentations (MSs). Clinical evaluation was then performed using publicly-available prostate images of 20 subjects with MSs generated by 3 different deep learning networks (DenseVNet, HighRes3DNet, and ScaleNet) and GTs drawn by 2 readers. The same readers also performed the 2D visual assessment of the MSs using a dual negative-positive grading of ?5 to 5 to reflect over- and under-estimation. Nine metrics that correlated well with visual assessment were selected for further evaluation using 3 different network ranking methods - based on a single metric, normalizing the metric using 2 GTs, and ranking the network based on a metric then averaging, including leave-one-out evaluation. These metrics yielded consistent ranking with HighRes3DNet ranked first then DenseVNet and ScaleNet using all ranking methods. Relative volume difference yielded the best positivity-agreement and correlation with dual visual assessment, and thus is better for providing over- and under-estimation. Interclass Correlation yielded the strongest correlation with the absolute visual assessment (0–5). Symmetric-boundary dice consistently yielded good discrimination of the networks for all three ranking methods with relatively small variations within network. Good rank discrimination may be an additional metric feature required for better network performance evaluation. © 2021 The Author(s)
dc.publisherElsevier Ltd
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.sourceScopus OA2021
dc.subjectDeep learning
dc.subjectEvaluation metrics
dc.subjectMedical image segmentation
dc.subjectProstate cancer
dc.subjectRank evaluation
dc.typeArticle
dc.contributor.departmentDEAN'S OFFICE (MEDICINE)
dc.contributor.departmentSURGERY
dc.description.doi10.1016/j.compbiomed.2021.104497
dc.description.sourcetitleComputers in Biology and Medicine
dc.description.volume134
dc.description.page104497
Appears in Collections:Elements
Staff Publications

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1016_j_compbiomed_2021_104497.pdf6.2 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons