Cross-Cloud MapReduce for Big Data
- Publisher:
- Institute of Electrical and Electronics Engineers (IEEE)
- Publication Type:
- Journal Article
- Citation:
- IEEE Transactions on Cloud Computing, 2020, 8, (2), pp. 375-386
- Issue Date:
- 2020-04-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
07229313.pdf | Published version | 1.24 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
© 2013 IEEE. MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geo-distributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a virtual cluster that resides in a single cloud. Its poor efficiency and high cost for big data support motivate us to propose a novel data-centric architecture with three key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and network coding based traffic routing. Our design leads to an optimization framework with the objective of minimizing both computation and transmission cost for running a set of MapReduce jobs in geo-distributed clouds. We further design a parallel algorithm by decomposing the original large-scale problem into several distributively solvable subproblems that are coordinated by a high-level master problem. Finally, we conduct real-world experiments and extensive simulations to show that our proposal significantly outperforms the existing works.
Please use this identifier to cite or link to this item: