A Predictive Analysis of Electronic Healthcare Records for Stroke Symptoms

Publication Type:
Issue Date:
Full metadata record
Cerebrovascular symptoms, commonly known as stroke, can affect different parts of the human body depending on the area of brain affected. The patients who survive usually have a poor quality of life because of serious illness, long-term disability and become a burden to their families and the health care system. There is a strong demand for the management focused on prevention and early treatment of disease by analysing different factors. However, a high volume of medical data, heterogeneity, and complexity have become the biggest challenges in stroke symptoms prediction. Algorithms with very high level of accuracy are, therefore, vital for medical diagnosis. The development of such algorithms nevertheless still remains obscure despite its importance and necessity for healthcare. Electronic Healthcare Records (EHRs) describe the details about patient’s physical and mental health, diagnosis, lab results, treatments or patient care plan and so forth. The huge amount of information in these records provides insights about the diagnosis and prediction of various diseases. Currently, the International Classification of Diseases, 10th Revision or ICD-10th codes is used for representing each patient record. The huge amount of information in these records provides insights about the diagnosis and prediction of various diseases. Various machine learning techniques are used for the analysis of data derived from these patient records. The predictive techniques have been widely applied in clinical decision making such as predicting occurrence of a disease or diagnosis, evaluating prognosis or outcome of diseases and assisting clinicians to recommend treatment of diseases. However, the conventional predictive models or techniques are still not effective enough in capturing the underlying knowledge because it is incapable of simulating the complexity on feature representation of the medical problem domains. This research used aggregated files of Electronic Healthcare Records (EHRs) from Department of Medical Services, The Ministry of Public Health of Thailand between 2015 and 2016. The empirical research is intended to evaluate the ability of machine learning and deep learning to recognize patterns in multi-label classification of stroke. This research aims at the investigation of five techniques: Support Vector Machine (SVM); k-Nearest Neighbours (k-NN); Backpropagation; Recurrent Neural Network (RNN); and Long Short-Term Memory - Recurrent Neural Network (LSTM-RNN). These are powerful and widely used techniques in machine learning and bioinformatics. First, we decoded ICD-10th codes into the health records, as well as other potential risk factors within EHRs into the pattern and model for prediction. Second, we purposed a conceptual Case Based Reasoning (CBR) framework for stroke disease prediction that uses previous case-based knowledge. A conceptual case-based reasoning framework to predict from patients’ health risk factors and to recognize a particular case that probably develop stroke and prepare or warn patients to handle disease burden outcome. It describes the design, implementation and evaluation of a novel system to facilitate stroke prediction, which relies on data collected from EHRs. Finally, the effectiveness of Backpropagation; RNN; and LSTM-RNN for prediction of stroke based on healthcare records is modelled. The results show several strong baselines that include accuracy, recall, and F1 measure score. Consequently, deep learning allows the disclosure of some unknown or unexpressed knowledge during prediction procedure, which is beneficial for decision-making in medical practice and provide useful suggestions and warnings to patient about unpredictable stroke.
Please use this identifier to cite or link to this item: