Br J Ophthalmol
Br J OphthalmolJune 2025Journal Article

Realistic fundus photograph generation for improving automated disease classification.

Artificial Intelligence

Summary

Latent diffusion models generated highly realistic retinal images, as validated by human experts. Adding generated images to the training set improved performance of a CNN ensemble without requiring additional real patient data.

Abstract

AIMS

This study aims to investigate whether denoising diffusion probabilistic models (DDPMs) could generate realistic retinal images, and if they could be used to improve the performance of a deep convolutional neural network (CNN) ensemble for multiple retinal disease classification, which was previously shown to outperform human experts.

METHODS

We trained DDPMs to generate retinal fundus images representing diabetic retinopathy, age-related macular degeneration, glaucoma or normal eyes. Eight board-certified ophthalmologists evaluated 96 test images to assess the realism of generated images and classified them based on disease labels. Subsequently, between 100 and 1000 generated images were employed to augment training of deep convolutional ensembles for classifying retinal disease. We measured the accuracy of ophthalmologists in correctly identifying real and generated images. We also measured the classification accuracy, F-score and area under the receiver operating curve of a trained CNN in classifying retinal diseases from a test set of 100 fundus images.

RESULTS

Ophthalmologists exhibited a mean accuracy of 61.1% (range: 51.0%-68.8%) in differentiating real and generated images. Augmenting the training set with 238 generated images in the smallest class statistically significantly improved the F-score and accuracy by 5.3% and 5.8%, respectively (p<0.01) in a retinal disease classification task, compared with a baseline model trained only with real images.

CONCLUSIONS

Latent diffusion models generated highly realistic retinal images, as validated by human experts. Adding generated images to the training set improved performance of a CNN ensemble without requiring additional real patient data.

Keywords

Diagnostic tests/InvestigationImagingRetina

In the Knowledge Library

Discussion

Comments and discussion will appear here in a future update.