Localized sampling for hospital re-admission prediction with imbalanced sample distributions

Publication Type:
Conference Proceeding
Proceedings of the International Joint Conference on Neural Networks, 2017, 2017-May pp. 4571 - 4578
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
07966436.pdfPublished version677.52 kB
Adobe PDF
© 2017 IEEE. Hospital re-admission refers to special medical events that a patient previously discharged from the hospital is readmitted within a short period of time (say 30 days). A re-admission not only downgrades the quality of living of the patient, it also adds significant financial burdens to the health care systems. To date, many systems exist to use computational approaches to predict the likelihood of a patient being readmitted in the future for medical decision assistance. When building predictive models for hospital re-admission prediction, one essential challenge is that sample distributions in the data are severely imbalanced where, typically, less than 10% of patients are likely going to be readmitted in a near future. A predictive model, without considering sample imbalance, will unlikely generate accurate results for prediction. To date, no existing re-admission model has explicitly addressed such data imbalance issues in their systems. In this paper, we consider hospital re-admission prediction with imbalanced sample distributions, and propose to use localized sampling approach to help build accurate predictive models. For localized sampling, we emphasize on samples which are difficult to classify, and allow the sampling process to bias to such instances. Because finding instances difficult to classify requires calculation of distance between instances, and the high dimensionality of Electronic Health Records (EHR) data makes the distance calculation highly ineffective, we propose to use latent topic embedding to reduce the sample from high dimensionality to a handful of low dimensional topic space for effective and accurate calculation of the distance between instances. By using localized sampling to build multiple versions of balanced datasets, we are able to train multiple predictive models and combine their results for prediction. Experiments and comparisons on data collected from several South Florida regional hospitals demonstrate the performance of our method.
Please use this identifier to cite or link to this item: