Am J OphthalmolSeptember 2024Comparative Study

Using Large Language Models to Generate Educational Materials on Childhood Glaucoma.

Authors

Qais Dihan, Muhammad Z Chauhan, Taher K Eleiwa, Amr K Hassan, Ahmed B Sallam, Albert S Khouri, Ta C Chang, Abdelrahman M Elhusseiny

Glaucoma SurgeryPediatric Glaucoma

36 citations 1 influential

Summary

LLMs can serve as strong supplemental tools in generating high-quality, accurate, and novel PEMs, and improving the readability of existing PEMs on childhood glaucoma.

Abstract

PURPOSE

To evaluate the quality, readability, and accuracy of large language model (LLM)-generated patient education materials (PEMs) on childhood glaucoma, and their ability to improve existing the readability of online information.

DESIGN

Cross-sectional comparative study.

METHODS

We evaluated responses of ChatGPT-3.5, ChatGPT-4, and Bard to 3 separate prompts requesting that they write PEMs on "childhood glaucoma." Prompt A required PEMs be "easily understandable by the average American." Prompt B required that PEMs be written "at a 6th-grade level using Simple Measure of Gobbledygook (SMOG) readability formula." We then compared responses' quality (DISCERN questionnaire, Patient Education Materials Assessment Tool [PEMAT]), readability (SMOG, Flesch-Kincaid Grade Level [FKGL]), and accuracy (Likert Misinformation scale). To assess the improvement of readability for existing online information, Prompt C requested that LLM rewrite 20 resources from a Google search of keyword "childhood glaucoma" to the American Medical Association-recommended "6th-grade level." Rewrites were compared on key metrics such as readability, complex words (≥3 syllables), and sentence count.

RESULTS

All 3 LLMs generated PEMs that were of high quality, understandability, and accuracy (DISCERN ≥4, ≥70% PEMAT understandability, Misinformation score = 1). Prompt B responses were more readable than Prompt A responses for all 3 LLM (P ≤ .001). ChatGPT-4 generated the most readable PEMs compared to ChatGPT-3.5 and Bard (P ≤ .001). Although Prompt C responses showed consistent reduction of mean SMOG and FKGL scores, only ChatGPT-4 achieved the specified 6th-grade reading level (4.8 ± 0.8 and 3.7 ± 1.9, respectively).

CONCLUSIONS

LLMs can serve as strong supplemental tools in generating high-quality, accurate, and novel PEMs, and improving the readability of existing PEMs on childhood glaucoma.

More by Qais Dihan

View full profile →

Racial and Ethnic Differences in the Association Between Statin Use and the Risk of Ocular Hypertension and Open-Angle Glaucoma.

2025Am J Ophthalmol5 citations

Fractal Dimension Analysis of Peripapillary Microvasculature in Primary Congenital Glaucoma.

2025J Glaucoma1 citations

Association Between Chronic Oral Nitrate Use and the Risk of Ocular Hypertension and Open-Angle Glaucoma.

2025Am J Ophthalmol

Top Research in Glaucoma Surgery

Browse all →

Treatment Outcomes in the Primary Tube Versus Trabeculectomy Study after 1 Year of Follow-up.

2018Ophthalmology312 citations

Efficacy, Safety, and Risk Factors for Failure of Standalone Ab Interno Gelatin Microstent Implantation versus Standalone Trabeculectomy.

2017Ophthalmology304 citations

Prospective, Randomized, Controlled Pivotal Trial of an Ab Interno Implanted Trabecular Micro-Bypass in Primary Open-Angle Glaucoma and Cataract: Two-Year Results.

2019Ophthalmology259 citations

In the Knowledge Library

Medical and Surgical Treatments for Childhood GlaucomasPatient EducationLlms For Educational Materials Congenital GlaucomasPatient And Family Education

Discussion

Comments and discussion will appear here in a future update.