Ophthalmol SciAugust 202412 citations

Diagnosing Glaucoma Based on the Ocular Hypertension Treatment Study Dataset Using Chat Generative Pre-Trained Transformer as a Large Language Model.

Raja Hina, Huang Xiaoqin, Delsoz Mohammad, Madadi Yeganeh, Poursoroush Asma, Munawar Asim, Kahook Malik Y, Yousefi Siamak

View on PubMed DOI: 10.1016/j.xops.2024.100599

AI Summary

ChatGPT 4.0 showed promising accuracy (87%) in diagnosing glaucoma from OHTS data, outperforming ChatGPT 3.5. This suggests LLMs could aid in analyzing ocular hypertension data to explore disease status.

Abstract

Purpose

To evaluate the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT), as a large language model (LLM), for diagnosing glaucoma using the Ocular Hypertension Treatment Study (OHTS) dataset, and comparing the diagnostic capability of ChatGPT 3.5 and ChatGPT 4.0.

Design

Prospective data collection study.

Participants

A total of 3170 eyes of 1585 subjects from the OHTS were included in this study.

Methods

We selected demographic, clinical, ocular, visual field, optic nerve head photo, and history of disease parameters of each participant and developed case reports by converting tabular data into textual format based on information from both eyes of all subjects. We then developed a procedure using the application programming interface of ChatGPT, a LLM-based chatbot, to automatically input prompts into a chat box. This was followed by querying 2 different generations of ChatGPT (versions 3.5 and 4.0) regarding the underlying diagnosis of each subject. We then evaluated the output responses based on several objective metrics.

Main outcome measures

Area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, and F1 score.

Results

Chat Generative Pre-Trained Transformer 3.5 achieved AUC of 0.74, accuracy of 66%, specificity of 64%, sensitivity of 85%, and F1 score of 0.72. Chat Generative Pre-Trained Transformer 4.0 obtained AUC of 0.76, accuracy of 87%, specificity of 90%, sensitivity of 61%, and F1 score of 0.92.

Conclusions

The accuracy of ChatGPT 4.0 in diagnosing glaucoma based on input data from OHTS was promising. The overall accuracy of ChatGPT 4.0 was higher than ChatGPT 3.5. However, ChatGPT 3.5 was found to be more sensitive than ChatGPT 4.0. In its current forms, ChatGPT may serve as a useful tool in exploring disease status of ocular hypertensive eyes when specific data are available for analysis. In the future, leveraging LLMs with multimodal capabilities, allowing for integration of imaging and diagnostic testing as part of the analyses, could further enhance diagnostic capabilities and enhance diagnostic accuracy.

Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Shields Classification

IntroAn Overview of Glaucoma

Ch. 10The Glaucoma Suspect: When to Treat?

Ch. 27Management of the Glaucoma Patient

Key Concepts5

Chat Generative Pre-Trained Transformer 3.5 (ChatGPT 3.5) achieved an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.74, accuracy of 66%, specificity of 64%, sensitivity of 85%, and F1 score of 0.72 for diagnosing glaucoma based on the Ocular Hypertension Treatment Study (OHTS) dataset.

DiagnosisCohortProspective data collection studyn=3170 eyes of 1585 subjects from the OHTSCh1Ch11Ch28

Chat Generative Pre-Trained Transformer 4.0 (ChatGPT 4.0) obtained an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.76, accuracy of 87%, specificity of 90%, sensitivity of 61%, and F1 score of 0.92 for diagnosing glaucoma based on the Ocular Hypertension Treatment Study (OHTS) dataset.

DiagnosisCohortProspective data collection studyn=3170 eyes of 1585 subjects from the OHTSCh1Ch11Ch28

The overall accuracy of Chat Generative Pre-Trained Transformer 4.0 (ChatGPT 4.0) was higher than ChatGPT 3.5 for diagnosing glaucoma based on input data from the Ocular Hypertension Treatment Study (OHTS) dataset (87% vs 66%).

Comparative EffectivenessCohortProspective data collection studyn=3170 eyes of 1585 subjects from the OHTSCh1Ch11Ch28

Chat Generative Pre-Trained Transformer 3.5 (ChatGPT 3.5) was found to be more sensitive (85%) than Chat Generative Pre-Trained Transformer 4.0 (ChatGPT 4.0, 61%) for diagnosing glaucoma based on input data from the Ocular Hypertension Treatment Study (OHTS) dataset.

Comparative EffectivenessCohortProspective data collection studyn=3170 eyes of 1585 subjects from the OHTSCh1Ch11Ch28

In its current forms, Chat Generative Pre-Trained Transformer (ChatGPT) may serve as a useful tool in exploring the disease status of ocular hypertensive eyes when specific data are available for analysis.

DiagnosisExpert OpinionProspective data collection studyn=3170 eyes of 1585 subjects from the OHTSCh1Ch11Ch28

Global Search

Diagnosing Glaucoma Based on the Ocular Hypertension Treatment Study Dataset Using Chat Generative Pre-Trained Transformer as a Large Language Model.

Abstract

Shields Classification

Key Concepts5

Related Articles5