Transl Vis Sci Technol
Transl Vis Sci TechnolJanuary 2026Journal Article

Assessing the Accuracy of Artificial Intelligence-Generated Clinical Summaries From Ambulatory Glaucoma Subspecialty Clinical Encounters.

Diagnosis & ScreeningDisease Progression

Summary

Although LLaMA 2 is not yet reliable as a standalone clinical tool, it shows promise to improve clinical communication.

Abstract

PURPOSE

The purpose of this study was to evaluate the accuracy of large language model (LLM) LLaMA 2-70B in summarizing glaucoma clinic notes into patient-friendly language and generating educational material.

METHODS

A random sample of 147 clinic notes from unique patients who visited Glaucoma Service at a tertiary center was analyzed. LLaMA 2 generated paragraph and bullet-point summaries in five subjects: (1) glaucoma diagnosis and type, (2) disease progression, (3) treatment plan, (4) treatment changes, and (5) surgical/laser interventions. Two ophthalmologists reviewed responses for accuracy and categorized them as "correct," "partially correct," or "incorrect." Discrepancies were adjudicated by a glaucoma specialist. A comparison using identical prompts was performed on a subset (n = 50) with ChatGPT-4.

RESULTS

LLaMA 2 correctly summarized 97 notes (66%) in paragraph and 103 (70%) in bullet format. Another 44 (30%) and 41 (28%) were partially correct, respectively. Paragraph summaries were more accurate and complete for glaucoma suspects than diagnosed patients (82% vs. 53%, P < 0.001). For targeted clinical questions, LLaMA 2 accurately identified glaucoma diagnosis in 118 notes (80%), disease stability/progression in 129 (88%), treatment plans in 127 (87%), treatment changes in 134 (91%), and surgical/laser interventions in 124 (84%). ChatGPT-4 achieved 46% correct paragraph summaries, 50% correct bullet summaries, and accuracies of 96%, 88%, 64%, 78%, and 82%, respectively, for targeted questions.

CONCLUSIONS

Although LLaMA 2 is not yet reliable as a standalone clinical tool, it shows promise to improve clinical communication.

TRANSLATION RELEVANCE

LLMs may enhance patient experience and health literacy by standardizing patient-friendly language in clinical care.

In the Knowledge Library

Discussion

Comments and discussion will appear here in a future update.