J GlaucomaJanuary 2026Comparative Study

Comparing Performance of Large Language Model-Based Tools on Patient-Driven Glaucoma Inquiries.

Authors

Dhruva Gupta, Sarah L Wagner, Ellenthal Alexandra G Castillejos, Andrew W Gross, Edward S Lu, Enchi K Chang, Arya S Rao, Marc D Succi

0 citations

Summary

GPT models surpass Gemini Pro in addressing commonly asked questions about glaucoma, providing valuable insights into the application of LLMs for providing health information.

Abstract

PRCIS

GPT-4o and GPT-4o Mini outperformed Gemini Pro in effectively answering glaucoma-related questions, suggesting that GPT models provide high-quality information and highlights the potential of AI chatbots to deliver medically relevant knowledge.

PURPOSE

Large language models (LLMs) can assist patients who seek medical knowledge online to guide their own glaucoma care. Understanding the differences in LLM performance on glaucoma-related questions can inform patients about the best resources to obtain relevant information.

METHODS

This cross-sectional study evaluated the accuracy, comprehensiveness, quality, and readability of LLM-generated responses to glaucoma inquiries. Seven questions posted by patients on the American Academy of Ophthalmology's Eye Care Forum were randomly selected and prompted into GPT-4o, GPT-4o Mini, Gemini Pro, and Gemini Flash in September 2024. Four physicians practicing ophthalmology assessed responses using a Likert scale based on accuracy, comprehensiveness, and quality. The Flesch-Kincaid Grade level measured readability, while Bidirectional Encoder Representations from Transformers (BERT) Scores measured semantic similarity between LLM responses. Statistical analysis involved either the Kruskal-Wallis test with Dunn post-hoc test or ANOVA analysis with Tukey Honestly Significant Difference (HSD) test.

RESULTS

GPT-4o rated higher in accuracy ( P =0.016), comprehensiveness ( P =0.007), and quality ( P =0.002) compared with Gemini Pro. GPT-4o Mini rated higher in comprehensiveness ( P =0.011) and quality ( P =0.007). Gemini Flash and Gemini Pro were similar across all criteria. There were no differences in readability, and LLMs mostly produced semantically similar responses.

CONCLUSIONS

GPT models surpass Gemini Pro in addressing commonly asked questions about glaucoma, providing valuable insights into the application of LLMs for providing health information.

Keywords

glaucomalarge language modelpatient information

In the Knowledge Library

Chapter 27:Management of the Glaucoma Patient Introduction:An Overview of Glaucoma

Discussion

Comments and discussion will appear here in a future update.