ChatGPT for Addressing Patient-centered Frequently Asked Questions in Glaucoma Clinical Practice.
Henrietta Wang, Katherine Masselos, Janelle Tong, Heather R M Connor, Janelle Scully, Sophia Zhang, Daniel Rafla, Matteo Posarelli, Jeremy C K Tan, Ashish Agar, Michael Kalloniatis, Jack Phu
Summary
ChatGPT-3.5 responses to FAQs in glaucoma were generally agreeable in terms of coherency, factuality, comprehensiveness, and safety.
Abstract
PURPOSE
Large language models such as ChatGPT-3.5 are often used by the public to answer questions related to daily life, including health advice. This study evaluated the responses of ChatGPT-3.5 in answering patient-centered frequently asked questions (FAQs) relevant in glaucoma clinical practice.
DESIGN
Prospective cross-sectional survey.
PARTICIPANTS
Expert graders.
METHODS
Twelve experts across a range of clinical, education, and research practices in optometry and ophthalmology. Over 200 patient-centric FAQs from authoritative professional society, hospital and advocacy websites were distilled and filtered into 40 questions across 4 themes: definition and risk factors, diagnosis and testing, lifestyle and other accompanying conditions, and treatment and follow-up. The questions were individually input into ChatGPT-3.5 to generate responses. The responses were graded by the 12 experts individually.
MAIN OUTCOME MEASURES
A 5-point Likert scale (1 = strongly disagree; 5 = strongly agree) was used to grade ChatGPT-3.5 responses across 4 domains: coherency, factuality, comprehensiveness, and safety.
RESULTS
Across all themes and domains, median scores were all 4 ("agree"). Comprehensiveness had the lowest scores across domains (mean 3.7 ± 0.9), followed by factuality (mean 3.9 ± 0.9) and coherency and safety (mean 4.1 ± 0.8 for both). Examination of the individual 40 questions showed that 8 (20%), 17 (42.5%), 24 (60%), and 8 (20%) of the questions had average scores below 4 (i.e., below "agree") for the coherency, factuality, comprehensiveness, and safety domains, respectively. Free-text comments by the experts highlighted omissions of facts and comprehensiveness (e.g., secondary glaucoma) and remarked on the vagueness of some responses (i.e., that the response did not account for individual patient circumstances).
CONCLUSIONS
ChatGPT-3.5 responses to FAQs in glaucoma were generally agreeable in terms of coherency, factuality, comprehensiveness, and safety. However, areas of weakness were identified, precluding recommendations for routine use to provide patients with tailored counseling in glaucoma, especially with respect to development of glaucoma and its management. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Keywords
More by Henrietta Wang
View full profile →Remote Grading of the Anterior Chamber Angle Using Goniophotographs and Optical Coherence Tomography: Implications for Telemedicine or Virtual Clinics.
Differences in Static and Kinetic Perimetry Results are Eliminated in Retinal Disease when Psychophysical Procedures are Equated.
Longitudinal variability outcomes of frontloaded visual field testing.
Top Research in Epidemiology & Genetics
Browse all →The Risks and Benefits of Myopia Control.
Two Phase 3 Clinical Trials Comparing the Safety and Efficacy of Netarsudil to Timolol in Patients With Elevated Intraocular Pressure: Rho Kinase Elevated IOP Treatment Trial 1 and 2 (ROCKET-1 and ROCKET-2).
Neuroprotective strategies for retinal disease.
In the Knowledge Library
Discussion
Comments and discussion will appear here in a future update.