Ophthalmol Sci
Ophthalmol Sci2025Journal Article

Comparison of Deep Learning and Clinician Performance for Detecting Referable Glaucoma from Fundus Photographs in a Safety Net Population.

Artificial IntelligenceDiagnosis & Screening

Summary

A deep learning algorithm effectively detects referable glaucoma from fundus photographs, outperforming clinicians with variable diagnostic accuracy. This tool could improve screening efficiency and access to timely glaucoma care.

Abstract

PURPOSE

Develop and test a deep learning (DL) algorithm for detecting referable glaucoma.

DESIGN

Retrospective cohort study.

PARTICIPANTS

A total of 6116 patients from the Los Angeles County (LAC) Department of Health Services (DHS) were included.

METHODS

Fundus photographs and patient-level labels of referable glaucoma (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists. A DL algorithm based on the Visual Geometry Group-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUROC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with 0 to 10 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists.

MAIN OUTCOME MEASURES

Area under the receiver operating curve, sensitivity, and specificity.

RESULTS

The DL algorithm was trained using 12 998 images from 5616 patients (2086 referable glaucoma, 3530 nonglaucoma). In this data set, the mean age was 56.8 ± 10.5 years with 54.8% women, 68.2% Latinos, 8.9% Blacks, 6.0% Asians, and 2.7% Whites. One thousand images from 500 patients (250 referable glaucoma, 250 nonglaucoma) with similar demographics (≥ 0.57) were used to test the algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels. Clinician grader sensitivity (range, 0.33-0.99) and specificity (range, 0.68-0.98) ranged widely and did not correlate with years of experience (≥ 0.49). Algorithm performance (AUROC = 0.93) also matched or exceeded the sensitivity (range, 0.78-1.00) and specificity (range, 0.32-0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded.

CONCLUSIONS

A DL algorithm for detecting referable glaucoma trained using patient-level data provided by certified LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate resources and provide more reproducible and timely glaucoma care.

FINANCIAL DISCLOSURES

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords

Artificial intelligenceDeep learningGlaucomaScreeningTelemedicine

This article has not yet been placed in the Knowledge Library.

Discussion

Comments and discussion will appear here in a future update.