Comparison of Deep Learning and Clinician Performance for Detecting Referable Glaucoma from Fundus Photographs in a Safety Net Population.
Nguyen Van, Iyengar Sreenidhi, Rasheed Haroon, Apolo Galo, Li Zhiwei, Kumar Aniket, Nguyen Hong, Bohner Austin, Bolo Kyle, Dhodapkar Rahul
AI Summary
A deep learning algorithm for detecting referable glaucoma from fundus photos matched or exceeded clinician performance, suggesting its utility for more consistent and efficient glaucoma screening, especially in resource-limited settings.
Abstract
Purpose
Develop and test a deep learning (DL) algorithm for detecting referable glaucoma.
Design
Retrospective cohort study.
Participants
A total of 6116 patients from the Los Angeles County (LAC) Department of Health Services (DHS) were included.
Methods
Fundus photographs and patient-level labels of referable glaucoma (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists. A DL algorithm based on the Visual Geometry Group-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUROC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with 0 to 10 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists.
Main outcome measures
Area under the receiver operating curve, sensitivity, and specificity.
Results
The DL algorithm was trained using 12 998 images from 5616 patients (2086 referable glaucoma, 3530 nonglaucoma). In this data set, the mean age was 56.8 ± 10.5 years with 54.8% women, 68.2% Latinos, 8.9% Blacks, 6.0% Asians, and 2.7% Whites. One thousand images from 500 patients (250 referable glaucoma, 250 nonglaucoma) with similar demographics ( P ≥ 0.57) were used to test the algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels. Clinician grader sensitivity (range, 0.33-0.99) and specificity (range, 0.68-0.98) ranged widely and did not correlate with years of experience ( P ≥ 0.49). Algorithm performance (AUROC = 0.93) also matched or exceeded the sensitivity (range, 0.78-1.00) and specificity (range, 0.32-0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded.
Conclusions
A DL algorithm for detecting referable glaucoma trained using patient-level data provided by certified LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate resources and provide more reproducible and timely glaucoma care.
Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Shields Classification
Key Concepts4
A deep learning (DL) algorithm based on the Visual Geometry Group-19 architecture for detecting referable glaucoma from fundus photographs, trained using patient-level labels (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists, matched or exceeded the performance of all independent clinician graders in detecting patient-level referable glaucoma based on Los Angeles County (LAC) Department of Health Services (DHS) optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels.
The deep learning (DL) algorithm for detecting referable glaucoma, trained using 12,998 images from 5,616 patients (2,086 referable glaucoma, 3,530 nonglaucoma) from the Los Angeles County (LAC) Department of Health Services (DHS), had an AUROC of 0.93 when tested against reference labels from an expert panel of 3 glaucoma specialists.
Clinician grader sensitivity (range, 0.33-0.99) and specificity (range, 0.68-0.98) for detecting referable glaucoma from fundus photographs, as assessed by 13 clinicians with 0 to 10 years of experience, ranged widely and did not correlate with years of experience (P ≥ 0.49).
The deep learning (DL) algorithm for detecting referable glaucoma, trained using patient-level data provided by certified Los Angeles County (LAC) Department of Health Services (DHS) optometrists, matched or exceeded the sensitivity (range, 0.78-1.00) and specificity (range, 0.32-0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded.
Related Articles5
Evaluating a Foundation Artificial Intelligence Model for Glaucoma Detection Using Color Fundus Photographs.
Observational StudyFeasibility and acceptance of artificial intelligence-based diabetic retinopathy screening in Rwanda.
Observational StudyTime to Glaucoma Progression Detection by Optical Coherence Tomography in Individuals of African and European Descents.
Cohort StudyA generalised computer vision model for improved glaucoma screening using fundus images.
Observational StudyOptical Coherence Tomography Versus Optic Disc Photo Assessment in Glaucoma Screening.
ReviewIs this article assigned to the wrong chapter(s)? Let us know.