Cross-Cloud MapReduce for Big Data

Li, P; Guo, S; Yu, S; Zhuang, W

Cross-Cloud MapReduce for Big Data

Li, P Guo, S Yu, S

Zhuang, W

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Cloud Computing, 2020, 8, (2), pp. 375-386
Issue Date:: 2020-04-01

Closed Access

	Filename	Description	Size
	07229313.pdf	Published version	1.24 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, P
dc.contributor.author	Guo, S
dc.contributor.author	Yu, S https://orcid.org/0000-0003-4485-6743
dc.contributor.author	Zhuang, W
dc.date.accessioned	2020-11-22T22:01:48Z
dc.date.available	2020-11-22T22:01:48Z
dc.date.issued	2020-04-01
dc.identifier.citation	IEEE Transactions on Cloud Computing, 2020, 8, (2), pp. 375-386
dc.identifier.issn	2168-7161
dc.identifier.issn	2168-7161
dc.identifier.uri	http://hdl.handle.net/10453/144234
dc.description.abstract	© 2013 IEEE. MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geo-distributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a virtual cluster that resides in a single cloud. Its poor efficiency and high cost for big data support motivate us to propose a novel data-centric architecture with three key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and network coding based traffic routing. Our design leads to an optimization framework with the objective of minimizing both computation and transmission cost for running a set of MapReduce jobs in geo-distributed clouds. We further design a parallel algorithm by decomposing the original large-scale problem into several distributively solvable subproblems that are coordinated by a high-level master problem. Finally, we conduct real-world experiments and extensive simulations to show that our proposal significantly outperforms the existing works.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Cloud Computing
dc.relation.isbasedon	10.1109/TCC.2015.2474385
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.subject	0805 Distributed Computing, 0806 Information Systems
dc.title	Cross-Cloud MapReduce for Big Data
dc.type	Journal Article
utslib.citation.volume	8
utslib.for	0805 Distributed Computing
utslib.for	0806 Information Systems
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
dc.date.updated	2020-11-22T22:01:43Z
pubs.issue	2
pubs.publication-status	Published
pubs.volume	8
utslib.citation.issue	2

Abstract:

© 2013 IEEE. MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geo-distributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a virtual cluster that resides in a single cloud. Its poor efficiency and high cost for big data support motivate us to propose a novel data-centric architecture with three key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and network coding based traffic routing. Our design leads to an optimization framework with the objective of minimizing both computation and transmission cost for running a set of MapReduce jobs in geo-distributed clouds. We further design a parallel algorithm by decomposing the original large-scale problem into several distributively solvable subproblems that are coordinated by a high-level master problem. Finally, we conduct real-world experiments and extensive simulations to show that our proposal significantly outperforms the existing works.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/144234