Ophthalmol Sci
Ophthalmol SciFebruary 2026Journal Article

A Novel Multimodal Implementation of a Foundation Artificial Intelligence Model Using Optic Nerve Head Fundus Photographs and OCT Imaging for Glaucoma Detection.

Artificial IntelligenceDiagnosis & ScreeningOCT & Imaging

Summary

A multimodal AI model improved glaucoma detection over single fundus photos. However, a single OCT model performed similarly, suggesting simpler, unimodal OCT AI may suffice for clinical use.

Abstract

PURPOSE

To compare the performance of unimodal and multimodal implementation of the self-supervised learning model RETFound in detecting glaucoma using color fundus photographs (CFPs) and OCT images, and to assess its generalizability across different ethnicities, age groups, and disease severities.

DESIGN

Evaluation of a diagnostic technology.

SUBJECTS PARTICIPANTS AND CONTROLS

Fourteen thousand five hundred ten CFPs and 32 640 OCTs from 1948 eyes of 1098 participants (60.8% glaucoma, 39.2% healthy) from the Diagnostic Innovations in Glaucoma Study and the African Descent and Glaucoma Evaluation Study were included. Glaucoma was defined as photograph-based glaucomatous optic neuropathy with or without repeatable glaucoma visual field damage.

METHODS

A multimodal RETFound model was developed using paired CFPs and OCT images. The model was compared to unimodal RETFound models using solely CFP or OCT images. Performance was also stratified by race (Black vs. White), age (<60 vs. ≥60 years), and disease severity (mild vs. moderate-to-severe glaucoma).

MAIN OUTCOME MEASURES

Diagnostic accuracy of unimodal and multimodal RETFound models using CFP and OCT for detecting glaucoma was assessed using the area under the receiver operating characteristic curve (AUC), precision, and recall.

RESULTS

The multimodal model for glaucoma detection achieved an AUC of 0.94 (95% confidence interval: 0.91-0.97), significantly outperforming the CFP unimodal model (AUC 0.86 [95% confidence interval: 0.81-0.89],< 0.001) but not the OCT unimodal model (AUC 0.93 [95% confidence interval: 0.90-0.96],= 0.47). Precision and recall were higher (0.96 and 0.87, respectively) for the multimodal model compared with the CFP model (0.92 and 0.69) across all subgroups. No significant differences based on race or age were found in either unimodal or multimodal glaucoma detection models. All models exhibited better performance in detecting moderate-to-severe glaucoma than mild glaucoma, with significant differences in the unimodal CFP (= 0.002) and OCT (= 0.005) models.

CONCLUSIONS

The multimodal RETFound model demonstrated improved diagnostic ability compared with the CFP unimodal model but did not significantly outperform the OCT unimodal model in glaucoma detection. As clinical implementation of a unimodal artificial intelligence (AI) model is easier than a multimodal counterpart, our results suggest unimodal OCT AI models may be sufficient for detecting glaucoma.

FINANCIAL DISCLOSURES

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords

Artificial intelligenceFoundation modelFundus photographsMultimodalOCT

This article has not yet been placed in the Knowledge Library.

Discussion

Comments and discussion will appear here in a future update.