Ophthalmol SciFebruary 20255 citations

Comparison of Deep Learning and Clinician Performance for Detecting Referable Glaucoma from Fundus Photographs in a Safety Net Population.

Nguyen Van, Iyengar Sreenidhi, Rasheed Haroon, Apolo Galo, Li Zhiwei, Kumar Aniket, Nguyen Hong, Bohner Austin, Bolo Kyle, Dhodapkar Rahul

View on PubMed DOI: 10.1016/j.xops.2025.100751

AI Summary

A deep learning algorithm for detecting referable glaucoma from fundus photos matched or exceeded clinician performance, suggesting its utility for more consistent and efficient glaucoma screening, especially in resource-limited settings.

Abstract

Purpose

Develop and test a deep learning (DL) algorithm for detecting referable glaucoma.

Design

Retrospective cohort study.

Participants

A total of 6116 patients from the Los Angeles County (LAC) Department of Health Services (DHS) were included.

Methods

Fundus photographs and patient-level labels of referable glaucoma (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists. A DL algorithm based on the Visual Geometry Group-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUROC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with 0 to 10 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists.

Main outcome measures

Area under the receiver operating curve, sensitivity, and specificity.

Results

The DL algorithm was trained using 12 998 images from 5616 patients (2086 referable glaucoma, 3530 nonglaucoma). In this data set, the mean age was 56.8 ± 10.5 years with 54.8% women, 68.2% Latinos, 8.9% Blacks, 6.0% Asians, and 2.7% Whites. One thousand images from 500 patients (250 referable glaucoma, 250 nonglaucoma) with similar demographics ( P ≥ 0.57) were used to test the algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels. Clinician grader sensitivity (range, 0.33-0.99) and specificity (range, 0.68-0.98) ranged widely and did not correlate with years of experience ( P ≥ 0.49). Algorithm performance (AUROC = 0.93) also matched or exceeded the sensitivity (range, 0.78-1.00) and specificity (range, 0.32-0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded.

Conclusions

A DL algorithm for detecting referable glaucoma trained using patient-level data provided by certified LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate resources and provide more reproducible and timely glaucoma care.

Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Shields Classification

Ch. 4Optic Nerve, Retina, and Choroid

Ch. 9Clinical Epidemiology of Glaucoma

Ch. 27Management of the Glaucoma Patient

Key Concepts4

A deep learning (DL) algorithm based on the Visual Geometry Group-19 architecture for detecting referable glaucoma from fundus photographs, trained using patient-level labels (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists, matched or exceeded the performance of all independent clinician graders in detecting patient-level referable glaucoma based on Los Angeles County (LAC) Department of Health Services (DHS) optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels.

Comparative EffectivenessCohortRetrospective Cohort Studyn=500 patients (250 referable glaucoma,…Ch5Ch10Ch28

The deep learning (DL) algorithm for detecting referable glaucoma, trained using 12,998 images from 5,616 patients (2,086 referable glaucoma, 3,530 nonglaucoma) from the Los Angeles County (LAC) Department of Health Services (DHS), had an AUROC of 0.93 when tested against reference labels from an expert panel of 3 glaucoma specialists.

DiagnosisCohortRetrospective Cohort Studyn=1,000 images from 500 patients for te…Ch5Ch10Ch28

Clinician grader sensitivity (range, 0.33-0.99) and specificity (range, 0.68-0.98) for detecting referable glaucoma from fundus photographs, as assessed by 13 clinicians with 0 to 10 years of experience, ranged widely and did not correlate with years of experience (P ≥ 0.49).

DiagnosisCohortRetrospective Cohort Studyn=13 clinicians grading 1,000 images fr…Ch5Ch10Ch28

The deep learning (DL) algorithm for detecting referable glaucoma, trained using patient-level data provided by certified Los Angeles County (LAC) Department of Health Services (DHS) optometrists, matched or exceeded the sensitivity (range, 0.78-1.00) and specificity (range, 0.32-0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded.

Comparative EffectivenessCohortRetrospective Cohort Studyn=6 certified LAC DHS optometrists grad…Ch5Ch10Ch28

Global Search

Comparison of Deep Learning and Clinician Performance for Detecting Referable Glaucoma from Fundus Photographs in a Safety Net Population.

Abstract

Shields Classification

Key Concepts4

Related Articles5