A MapReduce Based Approach of Scalable Multidimensional Anonymization for Big Data Privacy Preservation on Cloud

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
2013 Third International Conference on Cloud and Green Computing, 2013, pp. 105 - 112
Issue Date:
2013-01
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2013004167OK.pdf462.12 kB
Adobe PDF
The massive increase in computing power and data storage capacity provisioned by cloud computing as well as advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals by orders of magnitude. Meanwhile, privacy protection is one of most concerned issues in big data and cloud applications, thereby requiring strong preservation of customer privacy and attracting considerable attention from both IT industry and academia. Data anonymization provides an effective way for data privacy preservation, and multidimensional anonymization scheme is a widely-adopted one among existing anonymization schemes. However, existing multidimensional anonymization approaches suffer from severe scalability or IT cost issues when handling big data due to their incapability of fully leveraging cloud resources or being cost-effectively adapted to cloud environments. As such, we propose a scalable multidimensional anonymization approach for big data privacy preservation using Map Reduce on cloud. In the approach, a highly scalable median-finding algorithm combining the idea of the median of medians and histogram technique is proposed and the recursion granularity is controlled to achieve cost-effectiveness. Corresponding MapReduce jobs are dedicatedly designed, and the experiment evaluations demonstrate that with our approach, the scalability and cost-effectiveness of multidimensional scheme can be improved significantly over existing approaches.
Please use this identifier to cite or link to this item: