J GlaucomaMarch 2026Journal Article

Evaluation of Glaucoma Treatment Information on Social Media Using Large Language Models.

Authors

Asha Bulusu, Paul R Cotran, Amer M Alwreikat, Ying Jiang, Michael Lee Cooper, Kathryn Moynihan Ramsey, Ashwin P Verghese, David J Ramsey

Artificial Intelligence

0 citations

Summary

Although glaucoma experts and artificial intelligence (AI)-based systems were in moderate agreement when evaluating the quality of posts, the LLM was less able to discriminate posts of low quality.

Abstract

PRCIS

This study investigates the accuracy, readability, utility, and educational value of glaucoma treatment content on social media platforms and explores how large language models assess the quality of social media posts compared with glaucoma experts.

PURPOSE

To assess the quality of information on glaucoma treatment available on social media platforms.

METHODS

A 30-question survey consisting of the "top posts" from three social media platforms (X, Instagram, and Reddit) was assessed by 5 board-certified glaucoma experts across four domains (readability, utility, educational value, and accuracy) by using a 5-point Likert scale. The overall quality of each post was calculated as the average of the median score assigned to each of the four domains to create a reference standard. Expert agreement was assessed using Kendall's coefficient of concordance ( W ). A large language model (LLM), GPT-4 (OpenAI), was then prompted to evaluate the same posts with identical instructions. Agreement with expert consensus was compared using Cohen weighted kappa ( κ ), and the difference in favorability of each post assessed using McNemar exact test.

RESULTS

Fewer than half of social media posts on glaucoma treatment were judged favorably by glaucoma experts (40%). GPT-4 was less critical of social media content and provided a favorable rating nearly twice as often (77%, P =0.017). Despite this difference, there was moderate agreement between the LLM compared with the glaucoma experts ( κ =0.421, P =0.005). The lack of agreement predominantly stemmed from cases where the experts rated the content unfavorably, with disagreement occurring in 56% of cases, compared with 0% when the content was deemed favorable ( P =0.005).

CONCLUSIONS

Although glaucoma experts and artificial intelligence (AI)-based systems were in moderate agreement when evaluating the quality of posts, the LLM was less able to discriminate posts of low quality.

Keywords

artificial intelligence (AI)glaucomahealth literacylarge language modelssocial media

Top Research in Artificial Intelligence

Browse all →

Digital technology, tele-medicine and artificial intelligence in ophthalmology: A global perspective.

2021Prog Retin Eye Res492 citations

Deep learning in ophthalmology: The technical and clinical considerations.

2019Prog Retin Eye Res447 citations

Efficacy of a Deep Learning System for Detecting Glaucomatous Optic Neuropathy Based on Color Fundus Photographs.

2018Ophthalmology262 citations

In the Knowledge Library

Management of the Glaucoma PatientPatient EducationSocial Media An Overview of GlaucomaGlaucoma InformationSocial Media Quality

Discussion

Comments and discussion will appear here in a future update.