Using Big Data to Predict Mortality in ICU Patients

Today, Monday, August 25 at the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining in New York, ISTC for Big Data researchers will present a paper entitled “Unfolding Physiological State: Mortality Modelling in Intensive Care Units.”  The paper is an important work that looks at clinical notes – the observational, timely and knowledge-based information captured by nurses, doctors and other clinicians in the course of patient care – and the power of notes to help predict mortality, better manage patient health, and save lives.

The paper’s authors are ISTC for Big Data researchers Tristan Naumann and Marzyeh Ghassemi, PhD candidates at MIT CSAIL; and Peter Szolovits of CSAIL’s Clinical Decision Making Group; with Nicole Brimmer and Rohit Joshi of MIT; Finale Doshi-Velez of Harvard; and Anna Rumshisky of University of Massachusetts Lowell.

Here is a summary of the paper.

The intensive care unit (ICU) in hospitals is a particularly challenging environment because it’s always flooded with too much data:

  • The severity of the patient’s illness is constantly evolving.
  • Multiple, independent biometric monitoring devices often produce conflicting (and even false) alarms.
  • Electronic health records contain an increasingly large amount of data, including signals from instrumentation, intermittent results from lab tests, and text from clinical notes.

These voluminous, often inconsistent data can make it difficult for care staff to identify the most relevant information for diagnosing a patient’s condition and predicting mortality. Therefore, systems that can reliably identify the most relevant information can improve the efficiency and quality of care.

The most descriptive and perhaps the most relevant information exists in unstructured, free-text formats: clinical notes containing patient histories, chief complaints, and nursing check-ins into each patient’s current condition.

These notes give doctors a quick glance into the most important aspects of a patient’s physiology. Combining features extracted from these notes with standard physiological measurements results in a more complete representation of the patient’s physiological state and can improve prediction.

Unfortunately, free-text data are often difficult to include in predictive models, because they lack the structure required by most machine-learning methods. To overcome this problem, latent variable models such as topic models may be used to infer intermediary representations that can in turn be used as structured features for a prediction.

The research team examined the use of latent variable models to decompose free-text clinical notes into meaningful features, and the predictive power of these features for patient mortality.

The team found that latent topic-derived features were effective in determining patient mortality under three timelines: in-hospital, 30 day post-discharge, and 1 year post-discharge. Their results demonstrated that the latent topic features important in predicting hospital mortality are very different from those that are important in post-discharge mortality. In general, latent topic features were more predictive than structured features, and a combination of the two performed best.

Specifically, the team evaluated mortality prediction under three prediction regimes: (1) baseline regime, which used structured data available on admission, (2) time-varying regime, which used baseline features together with dynamically accumulated clinical text using increasingly large subsets of the patient’s narrative record, and (3) retrospective regime, which used all clinical text generated from a hospital stay to supplement the baseline features. In all targeted outcomes, the team demonstrated that adding information from clinical notes improves predictions of mortality.

The models and results explored in this work could ultimately be useful for interpretable models of disease and mortality.

You can read more about the team’s work in this recent blog post and download the paper here.

This entry was posted in Analytics, Big Data Applications, Databases and Analytics, ISTC for Big Data Blog and tagged , , , , . Bookmark the permalink.

Leave A Reply

Your email address will not be published. Required fields are marked *


3 − three =