TY - JOUR
T1 - Simple and Efficient Confidence Score for Grading Whole Slide Images
AU - Lubrano, Mélanie
AU - Bellahsen-Harrar, Yaëlle
AU - Fick, Rutger
AU - Badoual, Cécile
AU - Walter, Thomas
N1 - Publisher Copyright:
© 2023 CC-BY 4.0, M. Lubrano, Y.B.-H., R. Fick, C. Badoual & T. Walter.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Grading precancerous lesions on whole slide images is a challenging task: the continuous space of morphological phenotypes makes clear-cut decisions between different grades often difficult, leading to low inter- and intra-rater agreements. More and more Artificial Intelligence (AI) algorithms are developed to help pathologists perform and standardize their diagnosis. However, those models can render their prediction without consideration of the ambiguity of the classes and can fail without notice which prevent their wider acceptance in a clinical context. In this paper, we propose a new score to measure the confidence of AI models in grading tasks. Our confidence score is specifically adapted to ordinal output variables, is versatile and does not require extra training or additional inferences nor particular architecture changes. Comparison to other popular techniques such as Monte Carlo Dropout and deep ensembles shows that our method provides state-of-the art results, while being simpler, more versatile and less computationally intensive. The score is also easily interpretable and consistent with real life hesitations of pathologists. We show that the score is capable of accurately identifying mispredicted slides and that accuracy for high confidence decisions is significantly higher than for low-confidence decisions (gap in AUC of 17.1% on the test set). We believe that the proposed confidence score could be leveraged by pathologists directly in their workflow and assist them on difficult tasks such as grading precancerous lesions.
AB - Grading precancerous lesions on whole slide images is a challenging task: the continuous space of morphological phenotypes makes clear-cut decisions between different grades often difficult, leading to low inter- and intra-rater agreements. More and more Artificial Intelligence (AI) algorithms are developed to help pathologists perform and standardize their diagnosis. However, those models can render their prediction without consideration of the ambiguity of the classes and can fail without notice which prevent their wider acceptance in a clinical context. In this paper, we propose a new score to measure the confidence of AI models in grading tasks. Our confidence score is specifically adapted to ordinal output variables, is versatile and does not require extra training or additional inferences nor particular architecture changes. Comparison to other popular techniques such as Monte Carlo Dropout and deep ensembles shows that our method provides state-of-the art results, while being simpler, more versatile and less computationally intensive. The score is also easily interpretable and consistent with real life hesitations of pathologists. We show that the score is capable of accurately identifying mispredicted slides and that accuracy for high confidence decisions is significantly higher than for low-confidence decisions (gap in AUC of 17.1% on the test set). We believe that the proposed confidence score could be leveraged by pathologists directly in their workflow and assist them on difficult tasks such as grading precancerous lesions.
KW - confidence score
KW - grading
KW - multiple instance learning
KW - Uncertainty estimation
KW - whole slide images
UR - http://www.scopus.com/inward/record.url?scp=85189321901&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85189321901
SN - 2640-3498
VL - 227
SP - 151
EP - 169
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 6th International Conference on Medical Imaging with Deep Learning, MIDL 2023
Y2 - 10 July 2023 through 12 July 2023
ER -