Chat GPT vs an experienced ophthalmologist: evaluating chatbot writing performance in ophthalmology.
Gabriel Katz, Ofira Zloto, Avner Hostovsky, Ruth Huna-Baron, Mizrachi Iris Ben-Bassat, Zvia Burgansky, Alon Skaat, Vicktoria Vishnevskia-Dai, Ido Didi Fabian, Oded Sagiv, Ayelet Priel, Benjamin S Glicksberg, Eyal Klang
Summary
ChatGPT represents a significant advancement in facilitating the creation of original scientific papers in ophthalmology.
Abstract
PURPOSE
To examine the abilities of ChatGPT in writing scientific ophthalmology introductions and to compare those abilities to experienced ophthalmologists.
METHODS
OpenAI web interface was utilized to interact with and prompt ChatGPT 4 for generating the introductions for the selected papers. Consequently, each paper had two introductions-one drafted by ChatGPT and the other by the original author. Ten ophthalmology specialists with a minimal experience of more than 15 years, each representing distinct subspecialties-retina, neuro-ophthalmology, oculoplastic, glaucoma, and ocular oncology were provided with the two sets of introductions without revealing the origin (ChatGPT or human author) and were tasked to evaluate the introductions.
RESULTS
For each type of introduction, out of 45 instances, specialists correctly identified the source 26 times (57.7%) and erred 19 times (42.2%). The misclassification rates for introductions were 25% for experts evaluating introductions from their own subspecialty while to 44.4% for experts assessed introductions outside their subspecialty domain. In the comparative evaluation of introductions written by ChatGPT and human authors, no significant difference was identified across the assessed metrics (language, data arrangement, factual accuracy, originality, data Currency). The misclassification rate (the frequency at which reviewers incorrectly identified the authorship) was highest in Oculoplastic (66.7%) and lowest in Retina (11.1%).
CONCLUSIONS
ChatGPT represents a significant advancement in facilitating the creation of original scientific papers in ophthalmology. The introductions generated by ChatGPT showed no statistically significant difference compared to those written by experts in terms of language, data organization, factual accuracy, originality, and the currency of information. In addition, nearly half of them being indistinguishable from the originals. Future research endeavours should explore ChatGPT-4's utility in composing other sections of research papers and delve into the associated ethical considerations.
Discussion
Comments and discussion will appear here in a future update.