The Machine Learning Marketplace

This breakout will address key questions preventing health systems from getting started.

Healthcare is buzzing with stories about the benefits of leveraging data science to improve patient care and the overall patient experience. Organizations have questions about how to get started, where the greatest opportunities lie, and how to apply machine learning (ML) and artificial intelligence (AI).

This unique session provides attendees with access to 10 presenters in a walkabout format where participants can visit the stations they’re interested in for more in-depth information, including results and key lessons learned. Abstracts of each presenter’s work will be shared before the event, allowing participants to choose the topics and opportunities that are most meaningful to them. The ML Marketplace session is one hour long. Attendees can choose to attend the ML Marketplace during either Wave 4 or Wave 5, allowing more attendees to select this session as one of their afternoon options.

Machine Learning Models in Primary Care Decision Support

Organization: Acuitas Health

Presenters: Francesa Romano, MS, Dan Loman, MS

Importance: Several patient risk assessments are documented in the EHR, including readmission risk and fall risk. These assessments require a face-to-face visit, only consider a subset of important risk factors, and are highly subjective. Machine learning can augment these existing models for improved decision-making.

Objective: The objective of the 30-day readmission model is to assist transfer of care nurses and care managers in prioritizing patients for follow-up post-discharge, while the objective of the fall risk model is to identify patients who are at risk for falls resulting in injury. The implementation goal for both models is to deliver interpretable risk details to providers to support decision making, reducing the occurrence of readmissions and falls.

Methods: The readmission model is a logistic regression, and the fall risk model is an XGBoost decision tree. Both models are trained on features from the EHR and NLP applied to hospital activity alerts. Both were interpreted using SHAP and LIME to assess feature importance.

Findings: The readmission model was shown via a Monte Carlo simulation to be a more effective tool for prioritizing outreach to at-risk patients than existing models; the top 20% of patients from the readmission model were over 10% more likely to readmit than the top 20% of patients from the existing provider risk model. The fall risk model achieved an AUC of 0.754 and of the patients predicted to be at risk for a fall, over 66% had either no fall risk assessment or a low-risk assessment score.

Conclusions and Relevance: The outputs for both models were implemented in applications, and reports are available for providers and staff to access. The interpretability of the models was reviewed with providers and staff in various forums.

ARUP Test Matching

Organization: ARUP Labs

Presenters: Jason Lloyd, Keith Henrie

Importance: ARUP Laboratories customers submit tabular data containing the laboratory test names and related data they wish to order. ARUP matches the requested items with the ARUP test menu and provides pricing. Naming conventions among laboratory tests are not standardized, and technical limitations and data management practices create an environment where exact equivalence is unlikely.

Objective: To meet or exceed human expert performance at matching generic descriptions of services to ARUP’s official product menu, while reducing turnaround time and effort per opportunity.

Methods: ARUP devised a web-based system to aid in this process by ingesting the provided file, proposing matches, and allowing expert corrections to feed back into the matching algorithm in real time. ARUP constructed a Naïve Bayes classifier to perform the matching and augmented the system with self-maintaining features that keep the model current.

Findings: ARUP found that classification using test name, CPT, volume, and incumbent vendor was sufficient to achieve close to 80% accuracy, but was sensitive to poisoning by bad data. In the four years following implementation, ARUP observed a 60% increase in requests, a 50% increase in the number of line items, and time dedicated to this task decreased by half.

Conclusions and Relevance: The classification solution improves speed while maintaining a high level of accuracy when paired with expert review as part of the bidding process, but may never be accurate enough to be fully automated, partially due to the lack of a good ground truth reference. Many NLP techniques failed to produce improvements due to the lack of “natural” language in the data; however, some techniques such as stop-words were effective when used in an “off-label” manner.

Opening Black-Box Models for Better Clinical Intervention

Organization: Humana

Presenters: Molu Shi, PhD, Harpreet Singh, Yanting Dong

Importance: Current predictive models used to assess individuals’ risk for future non-emergent emergency department (ED) visits are developed using complex machine learning algorithms. Albeit their high accuracy, these models present a challenge for clinicians to decide the best intervention for a particular patient, because interpreting patient-specific clinical root causes through such complex algorithms are non-trivial.

Objective: To develop a general algorithm that can derive patient-specific explanations (top drivers) to predictive model risk assessment, a metric to quantify impact of the explanation (driver impact), and a method to evaluate the top driver output performance.

Methods: A general algorithm, individual feature permutation importance (IFPI) was developed to evaluate patient-level driver impact for predictive models. For each driver (model feature), its impact is derived from the model prediction sensitivity to randomly permuting the driver value with another patient. For each patient, insight on root causes of model prediction can thus be derived from drivers with the highest impact ranking, i.e., top drivers. To evaluate performance of IFPI, driver impacts are averaged among all patients, the ranking of which is then compared against the population-level feature importance of the predictive model.

Findings: The IFPI algorithm was tested on an existing predictive model for non-emergent ED utilization. The model was developed using Random Forest on historical claims data from approximately 3 million Medicare Advantage members. Driver impact of approximately 300 features used in the model were generated. In validation, Spearman correlation of 0.88 was obtained between the averaged driver impact and the model global feature importance, (Gini reduction). In production, personalized top drivers (i.e., high behavior health claim count, prescription non-adherence, or recent ED utilization) are listed to provide insights for outreach to high-risk patients identified by the model.

Conclusions and Relevance: Using top drivers obtained by IFPI, allows organizations to better understand patient-specific clinical root causes for high predictive model scores, and match members’ needs with appropriate interventions to support their best health and potentially avoid unnecessary ED visits.

Using Natural Language Processing (NLP) and Probabilistic Matching to Improve Efficiency, Reduce Redundancies and Simplify Data Warehouse Structure

Organization: Intermountain Healthcare

Presenters: Phat Doan, MS, MBA, MPH, Gabriella Smith

Importance: Substantial analyst time is spent wrangling, rather than analyzing, data. Exponential data growth, current analyst practices, and enterprise data warehouse (EDW) designs are not scalable. Innovative approaches are required to identify improvement opportunities due to repetitive SQL query writing, silos across the enterprise that produce similar reporting, and inefficient EDW design.

Objective: To mine analyst queries to identify usage by tables/schemas, assess the similarity between underlying SQL queries, and detect community of frequently joined tables to improve analyst efficiency, reduce work duplication, and standardize the EDW.

Methods: 500,000 SQL queries were collected over a six-month period.

  1. NLP techniques were utilized to pre-process and parse SQL queries into important syntaxes. Distance algorithm was used to determine the similarity between SQL queries and built into an analyst tool to find relevant queries.
  2. Probabilistic matching and data lineage tracing algorithms were developed to link Tableau reports to their SQL queries and capture source data elements. Similarity scoring was used to identify comparable reports. A management report was created for business leaders to initiate discussion between departments with similar reports.
  3. Neo4j graph database was deployed to project and mine table relationships with community detection algorithm to provide a data-mart blueprint of frequently joined tables.

Findings: Analysts provided positive feedback on utilizing the analyst-querying tool. Additional use cases emerged, such as identifying knowledge users of certain tables. Of 794 reports, 102 have a similarity score above 85%, suggesting a ~13% potential duplicate work. Given the average time of building a report is ~20 hours, this could save ~2000 analyst-hours. From ~1000 tables, 20 communities of frequently joined tables were identified. New data-marts were proposed with a potential reduction in computational cost and network traffic.

Conclusions and Relevance: SQL query analysis can be a sensitive subject. However, by approaching the objective from an operational improvement with multi-user perspective, the team crafted a different narrative that encouraged collaboration among data analysts and provided insights into better EDW design and management practice.

Temporal Variation and Anomaly Detection in High-Dimensional Spaces (TVAD-HDS)

Organization: Mission Health

Importance: Monitoring the quality of data pipelines in a production environment is rarely done. In particular, validation of two-dimensional arrays (Columns x Rows) often neglect the time-dependent latent structures within batch-to-batch variations. In predictive models, this often results in poor feature quality and silent failures in the ETL process.

Objective: Develop a suite of tools to detect special cause variations in high-dimensional data structures. The use of parallel architecture and closed-form solutions are required to accommodate large data sizes and calculation speeds for near real-time data streams. A variety of visualization tools would be provided to explore data integrity, reliability, and time-based patterns of variation.

Methods: Monitoring special cause variation of high-dimensional data (e.g., images, videos, and audio streams) as batch processes is a common practice in manufacturing machine-vision systems. A popular solution is to treat the data as multilinear tensor objects and monitor the decomposed low-rank features in a multivariate control chart. The proposed framework utilizes the natural structure of tensor objects to index, map, and store snapshots of batch loads in their lower-rank representation. The multivariate control chart converts the compressed tensor object into an easy to track single statistic.

Findings: TBD

Conclusions and Relevance: The method of using compressed tensor objects in combination with multivariate control charts provided users the ability to capture and identify abnormalities in both temporal and spatial relationships between data points at a cellular level. When integrated into the ETL development process of a machine learning pipeline, these tools can help provide early identification of contaminating data points and silent failures of tables.

Machine Learning Predicts Next-Day Patient Discharges Optimizing Hospital Capacity Management

Organization: Massachusetts General Hospital/ Partners Healthcare

Presenters: Taghi Khaniyev, PhD, Bethany Daily, MS

Importance: For Massachusetts General Hospital (MGH), efficient capacity management is critical to its mission. MGH faces the daily task of matching high demand with unpredictable supply, with little transparency around discharges. Currently, hospitals ask clinical teams to identify who will be discharged each day and single out candidates for early discharge—a time-consuming, manual, complex process.

Objective: Create a tool that predicts which surgical patients will leave the hospital within 24-hours, ranks patients in order of likelihood of discharge that day, predicts the total number of patients that will be discharged, and provides a comprehensive list of discharge barriers for each patient.

Methods: The tool tracks patients’ clinical milestones and barriers to discharge using over 900 input variables from the hospital’s EMR. It was trained using 16,187 surgical inpatients between April 2016 to September 2017. The average out-of-sample AUC achieved was 0.840.

Findings: In the prospective study, the tool generated predictions for 605 patients over a 102-day period from January through April 2018. Of those patients, 340 (56.2%) were discharged on the day predicted, compared with 136 (22.4%) who were not discharged. Causes of non-discharge included: clinical barriers (41 patients, 30.1%); variation in clinical practice (30 patients, 22.1%); and non-clinical reasons (65 patients, 47.8%). Based on this analysis, there were 128 bed-days during the study period that were occupied by a patient who remained in the hospital without a clinical barrier to discharge.

Conclusions and Relevance: The tool enables a timely, automatic, and transparent process to identify and prioritize patients for discharge, that is available to all stakeholders, reducing unnecessary hospital days and increasing access to hospital services.

It’s Gonna Be a Rough Month: Calming Your Executives with Regression Forecasts and Bayesian Updates

Organization: Mission Health

Presenters: Kaitlyn Bankieris, PhD

Importance: In healthcare, leaders often summarize performance using month-end reports containing various outcome measures. Leaders commonly preview these statistics part-way through the month with the desire to predict month-end numbers and react accordingly. The issue with this approach, however, is that mid-period assessments are based on hidden mental models.

Objective: In the domain of readmissions, Mission Health sought to formalize and automate mid-period forecasts, rather than utilizing leaders’ limited time and mental energy to generate off-the-cuff predictions.

Methods: Utilizing a combination of historical and current admission, discharge, and readmission data at a 795-bed hospital, Mission constructed a GAM-LSS regression model to generate month-end readmission rate predictions on a daily basis. This forecast includes Bayesian updates daily, incorporating new information into the prediction as it becomes available.

Findings: Mission created a mathematical forecast for month-end readmissions that incorporates daily updates. This approach not only provides an up-to-date estimate for the end-of-period statistic but also conveys the appropriate amount of uncertainty surrounding that forecast given the current progress through the period (i.e., there is more uncertainty near the beginning of a period compared to near the end of a period). Accordingly, on any day of the month, leaders can easily determine the current readmission rate forecast as well as the certainty surrounding that estimate.

Conclusions and Relevance: Collaborating with leaders resulted in an automated process that surfaces helpful information for the organization and eliminates the need for self-generated forecasts with mental models. This method can provide period-end predictions for any repeated, accumulating process such as the number of weekly appointments, monthly readmission rates, or the number of daily blood units.

Up, Up, and Away! Getting Grassroots ML to Stick: A Case Study on Nurse Flight Risk

Organization: UnityPoint Health

Presenters: Rhiannon Harms, Ben Cleveland, MS

Importance: Nationwide, the majority of healthcare institutions face shortages in nursing staff and challenges in recruiting and retention. Nurse turnover simultaneously impacts financial performance, patient safety, patient satisfaction, operational efficiency, and employee engagement. UnityPoint Health recognized the opportunity to apply machine learning in this area based on its potential for transformational change for the organization, team members, and patients served.

Objective: To predict the flight risk for nurses across the organization, the application sought to target areas at risk, support intervention strategies that scale from the individual nurse to the department level, support workforce planning, provide a view of an individual nurse’s tenure and risk factors, and align HR outcomes measures (manager effectiveness and employee engagement) to nurse subgroups and interventions.

Methods: The machine learning models that predict turnover at both the individual and department level use approximately 50 different variables that span domains covering employee demographics, benefits utilization, team and department recent turnover, work setting, position type, and payroll behavior such as the amount of overtime worked, PTO, etc.

Findings: The UnityPoint Health nurse flight risk model has an AUC of 0.75 with a sensitivity of about 70% at a 15% operational threshold. Analytic output was displayed in an intuitive graphical interface. Iterative testing in a user pilot demonstrated sufficient performance and operational utility by the pilot group.

Conclusions and Relevance: Through the pilot program of the nurse flight risk model, UnityPoint Health determined that machine learning can become a valuable technology in nurse retention strategies. However, in addition to technical development considerations, successful implementation requires deliberate stakeholder engagement, clear expectations in roles and responsibilities, and an honest consideration of ethical concerns.

Using SDOH Risk Analytics to Improve Population Health Among a Pediatric Population

Organization: UPMC

Presenters: Ashley Perry, MPH, CPH, Minette Vaccariello, MS

Importance: The significant impact that social determinants of health (SDOH) have on health outcomes, utilization patterns, and cost is well established in the literature. However, less is understood about how specific social risks impact vulnerable populations. Health systems and plans participating in value- and risk-based contracting seek to assess SDOH risks from a reliable, scalable, analytics-driven approach . Such an approach would enable informed resource deployment, intervention design, and partnership strategies to effectively mitigate the impact of SDOH risk, manage utilization and costs, and ultimately improve outcomes for patients.

Objective: To establish a robust analytic understanding of the SDOH risks faced by patients attributed to UPMC’s pediatric Clinically Integrated Network (CIN)— the Pennsylvania Pediatric Health Network (PPHN) —and to use these actionable insights to identify and prioritize, cohorts for which SDOH risks are driving excess utilization and cost and then design and implement targeted interventions.

Methods: Clinical data extracts from Health Catalyst were integrated into Socially Determined’s secure, cloud-based SDOH analytics platform. The individual-level patient data was integrated with community-level contextual data. SDOH drivers of excess utilization and cost were isolated. Subpopulations were then identified and prioritized for enhanced analysis using advanced machine learning models.

Findings: This analysis begins this summer and we expect the findings to be finalized prior to this session. Our team will share lessons learned and results at that time.

Conclusions and Relevance: SDOH risks are known to drive excess utilization and costs for populations; analysis of these risks provides a new and powerful tool for providers seeking to better manage the care of vulnerable populations, such as pediatric Medicaid beneficiaries.

Using Natural Language Processing to Automate Quality Measurement

Organization: UPMC Enterprises

Presenters: Rebecca Jacobson, MS, MD, FACMI

Importance: Financial incentives are increasingly tied to healthcare quality improvement, with examples including value-based payments and accountable care organizations. Systems, providers, and payers calculate and report measure compliance to state and federal agencies. Quality programs frequently necessitate manual record review, which is inefficient, expensive, and does not scale to populations.

Objective: To increase the efficiency of quality measurement across multiple payer and provider programs, by developing, validating, and field testing a set of natural language processing (NLP) algorithms, software services, and user-facing applications for semi-automated and fully automated NLP-based quality measurement.

Methods: UPMC developed an NLP-based analytics technology to (1) increase the speed of manual record review, and/or (2) automatically calculate the metric from EHR text. The software was implemented and field-tested in three settings across four metrics related to colorectal cancer screening/surveillance. Settings included UPMC Health Plan, the Wolff Center at UPMC, and the UPMC GI Service Line. Internal validation outcomes included precision, recall, and F1 for each extracted variable. Field testing outcomes included abstraction times and/or volumes, human-to-system percent agreement, and kappa. System-human disagreements were analyzed and categorized for root cause.

Findings: Across four metrics comprising 107 extracted variables, average precision, recall, and F1 were all 0.93. At UPMC Health Plan, abstraction volumes doubled when using the software. At UPMC Wolff Quality Center, median abstraction times decreased by 30%; variance was high. Human-to-system percent agreement and kappa were 84% ( = 0.7) for OP-29, and 80% ( =0.25) for OP-30. For these measures, human error was more commonly root cause for OP-29 (86% human vs. 14% system), and system error (primarily missing data) was more commonly root cause for OP-30 (65% system vs. 35% human).

Conclusions and Relevance: NLP can increase the speed of manual medical record review and potentially be used to calculate measures across large populations.