Ophthalmol Sci
Ophthalmol SciJanuary 2026Journal Article

Can a Natural Image-Based Foundation Model Outperform a Retina-Specific Model in Detecting Ocular and Systemic Diseases?

Artificial IntelligenceDiagnosis & Screening

Summary

DINOv2, a natural image-based model, outperformed retina-specific RETFound in ocular disease detection, while RETFound excelled in systemic disease prediction. This highlights the need to match model selection with specific clinical tasks.

Abstract

PURPOSE

DINOv2 is a natural image-based foundation model (FM), pretrained exclusively on 142 million natural images from the LVD-142M data set. In contrast, RETFound is a retina-specific FM, pretrained on ∼3 million images, including natural images, color fundus photos, and OCT images (∼1 million each). Despite DINOv2's massive pretraining data set, its application in ophthalmology and relative performance to domain-specific FMs remain understudied. To address this gap, we conducted a head-to-head comparative evaluation between DINOv2 and RETFound models across a range of downstream ocular and systemic disease tasks.

DESIGN

Retrospective head-to-head evaluation.

SUBJECTS

Ocular disease detection tasks included diabetic retinopathy (DR), glaucoma, and multiclass eye diseases, whereas systemic disease incidence prediction focused on the 3-year incidence of heart failure, myocardial infarction, and ischemic stroke. Eight open-source data sets (APTOS-2019, IDRID, MESSIDOR2 for DR; PAPILA, Glaucoma Fundus for glaucoma; JSIEC, Retina, OCTID for multiclass eye diseases) and the Moorfields AlzEye data set (for systemic diseases) were used for fine-tuning and internal testing. External test sets included the same open-source data sets (cross-dataset validation) and the UK Biobank (for systemic diseases).

METHODS

We replicated the fine-tuning methodology from the original RETFound study on 3 DINOv2 models (large, base, small). All models were fine-tuned on the respective data sets and evaluated through internal and external testing.

MAIN OUTCOME MEASURES

Area under the receiver operating characteristics curve and 2-sided t-tests were used to compare models' performances.

RESULTS

For ocular disease detection, DINOv2 models generally outperformed RETFound. For DR, DINOv2-Large achieved AUCs of 0.850 to 0.952, exceeding RETFound's 0.823 to 0.944 (all≤ 0.007). For multiclass eye diseases, DINOv2-large (AUC = 0.892, Retina data set) surpassed RETFound (AUC = 0.846,< 0.001). For glaucoma, DINOv2-base (AUC = 0.958, Glaucoma Fundus) outperformed RETFound (AUC = 0.940,< 0.001). Conversely, for systemic disease incidence prediction, RETFound achieved superior AUCs of 0.796 (heart failure), 0.732 (myocardial infarction), and 0.754 (ischemic stroke), outperforming DINOv2's best models' AUC (0.663-0.771, all< 0.001). This trend persisted in external validation.

CONCLUSIONS

Our findings reveal the merits of DINOv2 in ocular disease detection tasks, whereas RETFound demonstrates an edge in systemic disease incidence prediction. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimize clinical performance.

FINANCIAL DISCLOSURES

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords

Artificial intelligenceFoundation modelsOcular diseasesOculomicsRetina

This article has not yet been placed in the Knowledge Library.

Discussion

Comments and discussion will appear here in a future update.