Explaining the Rationale of Deep Learning Glaucoma Decisions with Adversarial Examples.
Chang Jooyoung, Lee Jinho, Ha Ahnul, Han Young Soo, Bak Eunoo, Choi Seulggie, Yun Jae Moon, Kang Uk, Shin Il Hyung, Shin Joo Young
AI Summary
This study showed adversarial examples better explain deep learning glaucoma decisions than conventional methods, increasing clinician trust by clarifying the AI's reasoning for diagnosis.
Abstract
Purpose
To illustrate what is inside the so-called black box of deep learning models (DLMs) so that clinicians can have greater confidence in the conclusions of artificial intelligence by evaluating adversarial explanation on its ability to explain the rationale of DLM decisions for glaucoma and glaucoma-related findings. Adversarial explanation generates adversarial examples (AEs), or images that have been changed to gain or lose pathologic characteristic-specific traits, to explain the DLM's rationale.
Design
Evaluation of explanation methods for DLMs.
Participants
Health screening participants (n = 1653) at the Seoul National University Hospital Health Promotion Center, Seoul, Republic of Korea.
Methods
We trained DLMs for referable glaucoma (RG), increased cup-to-disc ratio (ICDR), disc rim narrowing (DRN), and retinal nerve fiber layer defect (RNFLD) using 6430 retinal fundus images. Surveys consisting of explanations using AE and gradient-weighted class activation mapping (GradCAM), a conventional heatmap-based explanation method, were generated for 400 pathologic and healthy patient eyes. For each method, board-trained glaucoma specialists rated location explainability, the ability to pinpoint decision-relevant areas in the image, and rationale explainability, the ability to inform the user on the model's reasoning for the decision based on pathologic features. Scores were compared by paired Wilcoxon signed-rank test.
Main outcome measures
Area under the receiver operating characteristic curve (AUC), sensitivities, and specificities of DLMs; visualization of clinical pathologic changes of AEs; and survey scores for locational and rationale explainability.
Results
The AUCs were 0.90, 0.99, 0.95, and 0.79 and sensitivities were 0.79, 1.00, 0.82, and 0.55 at 0.90 specificity for RG, ICDR, DRN, and RNFLD DLMs, respectively. Generated AEs showed valid clinical feature changes, and survey results for location explainability were 3.94 ± 1.33 and 2.55 ± 1.24 using AEs and GradCAMs, respectively, of a possible maximum score of 5 points. The scores for rationale explainability were 3.97 ± 1.31 and 2.10 ± 1.25 for AEs and GradCAM, respectively. Adversarial example provided significantly better explainability than GradCAM.
Conclusions
Adversarial explanation increased the explainability over GradCAM, a conventional heatmap-based explanation method. Adversarial explanation may help medical professionals understand more clearly the rationale of DLMs when using them for clinical decisions.
MeSH Terms
Shields Classification
Key Concepts4
Deep learning models (DLMs) for referable glaucoma (RG), increased cup-to-disc ratio (ICDR), disc rim narrowing (DRN), and retinal nerve fiber layer defect (RNFLD) achieved AUCs of 0.90, 0.99, 0.95, and 0.79 respectively.
Deep learning models (DLMs) for referable glaucoma (RG), increased cup-to-disc ratio (ICDR), disc rim narrowing (DRN), and retinal nerve fiber layer defect (RNFLD) achieved sensitivities of 0.79, 1.00, 0.82, and 0.55 respectively, at 0.90 specificity.
Adversarial examples (AEs) for explaining deep learning model (DLM) decisions in glaucoma achieved a location explainability score of 3.94 ± 1.33 out of 5 points, significantly higher than GradCAM which scored 2.55 ± 1.24.
Adversarial examples (AEs) for explaining deep learning model (DLM) decisions in glaucoma achieved a rationale explainability score of 3.97 ± 1.31 out of 5 points, significantly higher than GradCAM which scored 2.10 ± 1.25.
Related Articles5
Optic Nerve Atrophy Conditions Associated With 3D Unsegmented Optical Coherence Tomography Volumes Using Deep Learning.
Cross-Sectional StudyAge-related changes in optical coherence tomography glaucoma-related parameters: A systematic review.
Systematic ReviewArtificial Intelligence for Optical Coherence Tomography in Glaucoma.
ReviewComparison of Deep Learning and Clinician Performance for Detecting Referable Glaucoma from Fundus Photographs in a Safety Net Population.
Cohort StudyArtificial Intelligence Deep Learning Models to Predict Spaceflight Associated Neuro-Ocular Syndrome.
Observational StudyIs this article assigned to the wrong chapter(s)? Let us know.