In the population of critically ill patients admitted to an intensive care unit (ICU), timely and correct decisions are essential for patient survival. However, decisions are made on highly dimensional observations originating from heterogeneous sources, such as laboratory tests, disease history, current comorbidities (multiple diagnoses), medications prescribed etc. Having in mind the cost of wrong decisions and the acceptable timeframe for decision making (that is often measured in seconds), it is obvious that doctors are confronted with highly challenging tasks with a high risk of error. Predictive analytics and machine learning have a high potential for supporting informed and timely decisions but there is still a large gap between potential and actual data usage because of a number of challenges that prevent the development of highly accurate and interpretable data-driven models for healthcare.
Main goals of the study are the (i) development of accurate and interpretable models for mortality prediction as early as possible (ii) comparative analyses and interpretation of the predictive power of laboratory tests in relation to outcome (iii) raising awareness of the potential impact of Data Mining in health-care.
In this research, we addressed problems of high dimensionality and sparsity by utilizing Lasso regularized logistic regression in order to develop interpretable models for early mortality risk prediction in a population of critically ill patients with chronic renal failure (CRF) admitted to the ICU. Lasso regularization provides embedded feature selection and thus provides interpretable models in terms of both numbers of features and model coefficients. Building on Lasso coefficients we employed bi-clustering and different visualization techniques in order to provide detailed inspection of the influence of laboratory tests on survival outcome. We created a model for each day of stay in ICU.
MIMIC III database was used to extract the data in about laboratory tests on a daily level (981 features), for each CRF patient admitted to ICU (at most 1500 patients), as well as age and gender information. We obtained AUC performances between 0.660 and 0.716 with no more than 87 features (initial datasets had over 900).
Some of the most interesting insights from this study are (i) patient mortality can be predicted fairly accurate after the second day of ICU stay (ii) many routine laboratory tests do not have predictive power (iii) most results from predictive methods complied with medical knowledge.