J Glaucoma
J GlaucomaMarch 2026Journal Article

Comparing the Accuracy of Four Artificial Intelligence Models in PubMed Citation Generation for Glaucoma Research.

Artificial Intelligence

Summary

Biomedically enriched AI, DeepSeek, most accurately generated glaucoma PubMed citations, outperforming general models. However, all AI models produced errors, emphasizing the critical need for human verification to prevent citation inaccuracies.

Abstract

PRCIS

DeepSeek, a biomedically enriched AI model, achieved the highest accuracy in generating PubMed citations for glaucoma research, outperforming general-purpose models and highlighting the necessity of human oversight to mitigate AI-related citation errors.

PURPOSE

This study evaluated the accuracy and reliability of four artificial intelligence (AI) models-ChatGPT (OpenAI GPT-3.5), Copilot (GitHub/Microsoft), DeepSeek (DeepSeek AI), and Gemini (Google AI)-in generating PubMed citations for glaucoma research. This study aimed to assess the potential of AI tools for academic reference generation and identify their limitations, particularly in specialized ophthalmology fields.

METHODS

Thirty-five standardized clinical paragraphs from The Review of Ophthalmology (4th edition) were used to test citation accuracy. Each model was instructed to generate AMA 11-style PubMed citations. Citations were evaluated for accuracy, DOI matching, and clinical relevance. An expert review validated the outputs and classified them as "Fully Cited," "Partially Cited," or "Not Cited."

RESULTS

DeepSeek, a biomedically enriched model, outperformed the others, with an accuracy of 92.0%. Copilot and Gemini achieved moderate accuracies of 66.7% and 25.8%, respectively, while ChatGPT achieved the lowest citation accuracy at 19.4%. Frequent errors included DOI mismatches, incorrect journal names, and irrelevant references. Expert review confirmed that even the best model produced citation errors, emphasizing the need for human oversight. We interpret this apparent advantage cautiously, as model details, updates, and changes in underlying data may influence performance.

CONCLUSION

AI models-particularly biomedically enriched tools such as DeepSeek-can accelerate citation drafting, but citation hallucinations and metadata errors remain common. AI should serve as a decision support tool for reference retrieval and formatting, not a substitute for rigorous manual verification before submission.

Keywords

Artificial IntelligenceCitation AccuracyGlaucoma DiseaseOphthalmologyPubMed Citations

Discussion

Comments and discussion will appear here in a future update.