Organization: UPMC Enterprises
Presenters: Rebecca Jacobson, MS, MD, FACMI
Importance: Financial incentives are increasingly tied to healthcare quality improvement, with examples including value-based payments and accountable care organizations. Systems, providers, and payers calculate and report measure compliance to state and federal agencies. Quality programs frequently necessitate manual record review, which is inefficient, expensive, and does not scale to populations.
Objective: To increase the efficiency of quality measurement across multiple payer and provider programs, by developing, validating, and field testing a set of natural language processing (NLP) algorithms, software services, and user-facing applications for semi-automated and fully automated NLP-based quality measurement.
Methods: UPMC developed an NLP-based analytics technology to (1) increase the speed of manual record review, and/or (2) automatically calculate the metric from EHR text. The software was implemented and field-tested in three settings across four metrics related to colorectal cancer screening/surveillance. Settings included UPMC Health Plan, the Wolff Center at UPMC, and the UPMC GI Service Line. Internal validation outcomes included precision, recall, and F1 for each extracted variable. Field testing outcomes included abstraction times and/or volumes, human-to-system percent agreement, and kappa. System-human disagreements were analyzed and categorized for root cause.
Findings: Across four metrics comprising 107 extracted variables, average precision, recall, and F1 were all 0.93. At UPMC Health Plan, abstraction volumes doubled when using the software. At UPMC Wolff Quality Center, median abstraction times decreased by 30%; variance was high. Human-to-system percent agreement and kappa were 84% ( = 0.7) for OP-29, and 80% ( =0.25) for OP-30. For these measures, human error was more commonly root cause for OP-29 (86% human vs. 14% system), and system error (primarily missing data) was more commonly root cause for OP-30 (65% system vs. 35% human).
Conclusions and Relevance: NLP can increase the speed of manual medical record review and potentially be used to calculate measures across large populations.