Transl Vis Sci Technol
Transl Vis Sci TechnolNovember 2025Journal Article

Enhancing Glaucoma Diagnosis Through Multi-Layer Transformer and Multi-Modal Feature Fusion.

Diagnosis & Screening

Summary

The proposed multi-modal-based glaucoma grading framework offers a more accurate diagnostic tool by integrating multiple examination modalities and prior knowledge, representing a substantial improvement over existing single-modality-based systems.

Abstract

PURPOSE

To develop a more accurate glaucoma grading framework by combining multiple examination modalities, aiming to overcome the limitations of single-modality diagnostic systems for comprehensive glaucoma diagnosis.

METHODS

This paper proposes a novel multi-modal-based glaucoma grading framework to classify healthy, mild glaucoma, and moderate-to-severe glaucoma patients. The method simulates the clinical diagnosis process by leveraging multiple examination modalities and integrating prior knowledge of ocular structure to enhance feature learning. A multi-modal feature fusion framework (M2F3) is developed, utilizing a multi-layer transformer (MLT) for efficient combination of modalities. A contrastive learning strategy is also employed to improve feature learning further.

RESULTS

Experimental results demonstrated that the proposed M2F3 glaucoma grading method shows a substantial 0.0465 increase in Cohen's kappa (κ) coefficient compared to state-of-the-art (SOTA) methods on the Glaucoma grAding from Multi-Modality imAges (GAMMA) dataset.

CONCLUSIONS

The proposed multi-modal-based glaucoma grading framework offers a more accurate diagnostic tool by integrating multiple examination modalities and prior knowledge, representing a substantial improvement over existing single-modality-based systems.

In the Knowledge Library

Discussion

Comments and discussion will appear here in a future update.