Ophthalmol SciJanuary 20251 citations

Enhanced Phenotype Identification of Common Ocular Diseases in Real-World Datasets.

Stein Joshua D, An Hong Su, Andrews Chris A, Pershing Suzann, Mungle Tushar, Bicket Amanda K, Rosenthal Julie M, Zhang Amy D, Lee Wen-Shin, Ludwig Cassie

View on PubMed DOI: 10.1016/j.xops.2025.100717

AI Summary

This study developed enhanced algorithms using comprehensive EHR data to accurately identify glaucoma, DR, and AMD patients, outperforming ICD codes. This improves real-world research and patient management.

Abstract

Objective

For studies using real-world data, accurately identifying patients with phenotypes of interest is challenging. To identify cohorts of interest, most studies exclusively use the International Classification of Diseases (ICD) billing codes, which can be limiting. We developed a method to accurately identify the presence or absence of 3 common ocular diseases (diabetic retinopathy [DR], age-related macular degeneration [AMD], and glaucoma) using electronic health record (EHR) data.

Design

Database study.

Participants

Three thousand nine hundred fourteen eyes from 1957 patients at 2 Sight OUtcomes Research CollaborativE (SOURCE) Ophthalmology Data Repository sites.

Methods

We developed enhanced phenotype identification (EPI) algorithms that search EHR fields, including eye examination findings, orders, charges, medication prescriptions, and surgery data for evidence that a patient has glaucoma, DR, or AMD. We trained our EPI models using gold standard assessments of the EHR by ophthalmologists for the presence/absence of these conditions, compared the performance of our EPI models to models developed using ICD codes alone, and validated the performance of model using data from another SOURCE site.

Main outcome measures

Area under the receiver operating curve (AUC), area under the precision-recall curve (AUPRC), and model calibration.

Results

The AUCs of our EPI models were better than ICD-only models for glaucoma (0.97 vs. 0.90), DR (0.997 vs. 0.98), and AMD (0.99 vs. 0.95). The AUPRCs of our EPI models were also much better than ICD-only models for glaucoma (0.79 vs. 0.32), DR (0.96 vs. 0.84), and AMD (0.74 vs. 0.55). When testing on patients from a second SOURCE site, the AUC and AUPRC for glaucoma (0.93, 0.74), DR (0.98, 0.77), and AMD (0.96, 0.64) were slightly worse than the primary site but still quite high. However, for all 3 conditions, model calibration was worse at the second site.

Conclusions

Leveraging machine learning, we developed EPI models to accurately identify most patients with glaucoma, DR, and AMD in real-world datasets. The EPI models significantly outperform ICD-only models in identifying patients confirmed to have these conditions. These findings underscore the potential of using comprehensive EHR data combined with advanced machine learning techniques to improve the accuracy of patient phenotype identification, leading to better patient management and clinical outcomes.

Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Shields Classification

Ch. 9Clinical Epidemiology of Glaucoma

Ch. 27Management of the Glaucoma Patient

Key Concepts5

The area under the receiver operating curve (AUC) for enhanced phenotype identification (EPI) models was better than ICD-only models for glaucoma (0.97 vs. 0.90), diabetic retinopathy (DR) (0.997 vs. 0.98), and age-related macular degeneration (AMD) (0.99 vs. 0.95).

Comparative EffectivenessCohortDatabase studyn=3914 eyes from 1957 patients at 2 Sig…Ch10Ch28

The area under the precision-recall curve (AUPRC) for enhanced phenotype identification (EPI) models was much better than ICD-only models for glaucoma (0.79 vs. 0.32), diabetic retinopathy (DR) (0.96 vs. 0.84), and age-related macular degeneration (AMD) (0.74 vs. 0.55).

Comparative EffectivenessCohortDatabase studyn=3914 eyes from 1957 patients at 2 Sig…Ch10Ch28

When testing on patients from a second SOURCE site, the AUC and AUPRC for glaucoma (0.93, 0.74), diabetic retinopathy (DR) (0.98, 0.77), and age-related macular degeneration (AMD) (0.96, 0.64) using enhanced phenotype identification (EPI) models were slightly worse than the primary site but still quite high.

Comparative EffectivenessCohortDatabase studyn=3914 eyes from 1957 patients at 2 Sig…Ch10Ch28

Enhanced phenotype identification (EPI) models, leveraging machine learning and comprehensive electronic health record (EHR) data, accurately identify most patients with glaucoma, diabetic retinopathy (DR), and age-related macular degeneration (AMD) in real-world datasets.

DiagnosisCohortDatabase studyn=3914 eyes from 1957 patients at 2 Sig…Ch10Ch28

Enhanced phenotype identification (EPI) models for glaucoma, diabetic retinopathy (DR), and age-related macular degeneration (AMD) were developed by searching electronic health record (EHR) fields, including eye examination findings, orders, charges, medication prescriptions, and surgery data.

MethodologyCohortDatabase studyn=3914 eyes from 1957 patients at 2 Sig…Ch28

Global Search

Enhanced Phenotype Identification of Common Ocular Diseases in Real-World Datasets.

Abstract

Shields Classification

Key Concepts5

Related Articles5