Healthcare data has become a great concern in the academic world and in industry. The deployment of electronic health records (EHRs) and healthcare-related services on cloud platforms will reduce the cost and complexity of handling and integrating medical records while improving efficiency and accuracy. To make effective use of advanced features such as high availability, reliability, and scalability of Cloud services, EHRs have to be stored in the clouds. By exposing EHRs in an outsourced environment, however, a number of serious issues related to data security and privacy, distribution and processing such as the loss of the controllability, different data formats and sizes, the leakage of sensitive information in processing, sensitive-delay requirements has been naturally raised. Many attempts have been made to address the above concerns, but most of the attempts tackled only some aspects of the problem. Encryption mechanisms can resolve the data security and privacy requirements but introduce intensive computing overheads as well as complexity in key distribution. Data is not guaranteed being protected when it is moved from one cloud to another because clouds may not use equivalent protection schemes. Sensitive data is being processed at only private clouds without sufficient resources. Consequently, Cloud computing has not been widely adopted by healthcare providers and users. Protecting and managing health data efficiently in many aspects is still an open question for current research.
In this dissertation, we investigate data security and efficient management of big health data in cloud environments. Regarding data security, we establish an active data protection framework to protect data; we investigate a new approach for data mobility; we propose trusted evaluation for cloud resources in processing sensitive data. For efficient management, we investigate novel schemes and models in both Cloud computing and Fog computing for data distribution and data processing to handle the rapid growth of data, higher security on demand, and delay requirements.
The novelty of this work lies in the novel data mobility management model for data protection, the efficient distribution scheme for a large-scale of EHRs, and the trust-based scheme in security and processing. The contributions of this thesis can be summarized according to data security and efficient data management.
On data security, we propose a data mobility management model to protect data when it is stored and moved in clouds. We suggest a trust-based scheduling scheme for big data processing with MapReduce to fulfil both privacy and performance issues in a cloud environment.
• The data mobility management introduces a new location data structure into an active data framework, a Location Registration Database (LRD), protocols for establishing a clone supervisor and a Mobility Service (MS) to handle security and privacy requirements effectively. The model proposes a novel security approach for data mobility and leads to the introduction of a new Data Mobility as a Service (DMaaS) in the Cloud.
• The Trust-based scheduling scheme investigates a novel composite trust metric and a real-time trust evaluation for cloud resources to provide the highest trust execution on sensitive data. The proposed scheme introduces a new approach for big data processing to meet with high security requirements.
On the efficient data management, we propose a novel Hash-Based File Clustering (HBFC) scheme and data replication management model to distribute, store and retrieve EHRs efficiently. We propose a data protection model and a task scheduling scheme which is Region-based for Fog and Cloud to address security and local performance issues.
• The HBFC scheme innovatively utilizes hash functions to cluster files in defined clusters such that data can be stored and retrieved quickly while maintaining the workload balance efficiently. The scheme introduces a new clustering mechanism in managing a large-scale of EHRs to deliver healthcare services effectively in the cloud environment.
• The trust-based scheduling model uses the proposed trust metric for task scheduling with MapReduce. It not only provides maximum trust execution but also increases resource utilization significantly. The model suggests a new trust-oriented scheduling mechanism between tasks and resources with MapReduce.
• We introduce a novel concept “Region” in Fog computing to handle the data security and local performance issues effectively. The proposed model provides a novel Fog-based Region approach to handle security and local performance requirements.
We implement and evaluate our proposed models and schemes intensively based on both real infrastructures and simulators. The outcomes demonstrate the feasibility and the efficiency of our research in this thesis. By proposing innovative concepts, metrics, algorithms, models, and services, the significant contributions of this thesis enable both healthcare providers and users to adopt cloud services widely, and allow significant improvements in providing better healthcare services.