Am J OphthalmolMarch 20261 citations

ChatGPT-Assisted Glaucoma Diagnosis: A Health-Equitable Multi-Ancestry Analysis Using Visual Field and Optical Coherence Tomography Data.

Huang Andy S, Fam Anthony, Zhao Hetince, Paulescu Nicole, Fabczak-Kubicka Anna, Wiggs Janey L, Zebardast Nazlee, Friedman David S, DO Ron, Aziz Kanza

View on PubMed DOI: 10.1016/j.ajo.2025.11.046

AI Summary

ChatGPT diagnosed glaucoma with high sensitivity (96%) and reasonable specificity (83.7%) from visual field/OCT data. Its performance was consistent across diverse ancestries and genetic risks, suggesting LLMs could be unbiased screening tools for early glaucoma detection.

Abstract

Purpose

Early glaucoma detection is challenging due to variable ocular anatomy, non-glaucomatous optic neuropathy impacting optical coherence tomography (OCT) results, and the subjective nature of visual field (VF) tests. Multimodal large language models may overcome these challenges to provide equitable and accurate screening diagnoses across ancestries and glaucoma genetic predispositions. We evaluated ChatGPT o1 Pro's accuracy in identifying glaucoma using circumpapillary retinal nerve fiber layer (RNFL) OCT and VF data, and its consistency across ancestries and glaucoma polygenic risk scores (PRS).

Design

Cross-sectional diagnostic accuracy study.

Settings and participants: We enrolled 204 participants from the Mount Sinai BioMe Biobank for a comprehensive ophthalmic examination from November 2022 to March 2025. This cross-sectional diagnostic accuracy study included 38% European (EUR) and 62% non-European (non-EUR) participants stratified by low/intermediate (n = 107) and high-risk glaucoma PRS (n = 97). Two glaucoma specialists masked to PRS status provided a consensus reference diagnosis. ChatGPT received only de-identified VFs and OCT-RNFL numerical outputs to determine glaucoma status. Performance metrics were compared with the reference diagnosis. Subgroup comparisons by ancestry (EUR versus non-EUR) and PRS (high versus low/intermediate) were conducted. We used logistic regression models to assess the impacts of ancestry, PRS and ocular parameters on classification accuracy.

Main outcome measures

ChatGPT o1 Pro's diagnostic performance in detecting glaucoma compared to consensus specialist diagnoses, stratified by ancestry and genetic risk.

Results

ChatGPT o1 Pro exhibited 96.0% sensitivity (95% confidence interval (CI): 88.3%-100%), 83.7% specificity (95% CI: 78.3%-89.1%), 85.2% accuracy (95% CI: 80.3%-90.1%), an area under the receiver operator curve (AUC) of 0.899, a positive predictive value (PPV) of 45.3% (95% CI: 31.9%-58.7%), and a negative predictive value (NPV) of 99.3% (95% CI: 98.0%-100%); κ for agreement with the consensus reference was 0.538. No significant differences were observed between EUR and non-EUR subgroups (AUC: 0.894 vs 0.906, P = .79; accuracy: 88.3% vs 83.3%, P = .44) or high and low/intermediate-PRS subgroups (AUC: 0.889 vs 0.922, P = .45; accuracy: 85.4% vs 85.0%, P = .50). Global RNFL was the only determinant of reference disease classification (OR = 1.1 per micron, P < .001).

Conclusion

ChatGPT o1 Pro diagnosed glaucoma similarly to specialists using only VF and OCT data. The model performance was similar across ancestral groups and genetic predispositions to glaucoma.

MeSH Terms

HumansTomography, Optical CoherenceCross-Sectional StudiesVisual FieldsMaleFemaleMiddle AgedRetinal Ganglion CellsVisual Field TestsNerve FibersGlaucomaAgedIntraocular PressureROC CurveOptic DiskReproducibility of ResultsAdultGenerative Artificial Intelligence

Key Concepts4

ChatGPT o1 Pro exhibited 96.0% sensitivity (95% CI: 88.3%-100%), 83.7% specificity (95% CI: 78.3%-89.1%), and 85.2% accuracy (95% CI: 80.3%-90.1%) in identifying glaucoma using circumpapillary retinal nerve fiber layer (RNFL) OCT and visual field (VF) data, compared to consensus specialist diagnoses.

DiagnosisCross-sectionalCross-sectional diagnostic accuracy studyn=204 participantsCh5Ch6Ch28

ChatGPT o1 Pro demonstrated an area under the receiver operator curve (AUC) of 0.899, a positive predictive value (PPV) of 45.3% (95% CI: 31.9%-58.7%), and a negative predictive value (NPV) of 99.3% (95% CI: 98.0%-100%) for glaucoma diagnosis using circumpapillary retinal nerve fiber layer (RNFL) OCT and visual field (VF) data.

DiagnosisCross-sectionalCross-sectional diagnostic accuracy studyn=204 participantsCh5Ch6Ch28

No significant differences were observed in ChatGPT o1 Pro's glaucoma diagnostic performance between European (EUR) and non-European (non-EUR) subgroups (AUC: 0.894 vs 0.906, P = .79; accuracy: 88.3% vs 83.3%, P = .44) or between high and low/intermediate-polygenic risk score (PRS) subgroups (AUC: 0.889 vs 0.922, P = .45; accuracy: 85.4% vs 85.0%, P = .50).

DiagnosisCross-sectionalCross-sectional diagnostic accuracy studyn=204 participants (38% European, 62% n…Ch9Ch10Ch28

Global retinal nerve fiber layer (RNFL) was the only determinant of reference disease classification, with an odds ratio (OR) of 1.1 per micron (P < .001) in a study assessing ChatGPT o1 Pro's glaucoma diagnostic accuracy.

DiagnosisCross-sectionalCross-sectional diagnostic accuracy studyn=204 participantsCh5Ch28

Global Search

ChatGPT-Assisted Glaucoma Diagnosis: A Health-Equitable Multi-Ancestry Analysis Using Visual Field and Optical Coherence Tomography Data.

Abstract

MeSH Terms

Key Concepts4

Related Articles5