A clustering based coscheduling strategy for efficient scientific workflow execution in cloud computing

Deng, K; Ren, K; Song, J; Yuan, D; Xiang, Y; Chen, J

A clustering based coscheduling strategy for efficient scientific workflow execution in cloud computing

Deng, K Ren, K Song, J Yuan, D Xiang, Y Chen, J

Permalink

Publication Type:: Journal Article
Citation:: Concurrency Computation Practice and Experience, 2013, 25 (18), pp. 2523 - 2539
Issue Date:: 2013-12-25

Closed Access

	Filename	Description	Size
	2013000947OK.pdf		1.48 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Deng, K	en_US
dc.contributor.author	Ren, K	en_US
dc.contributor.author	Song, J	en_US
dc.contributor.author	Yuan, D	en_US
dc.contributor.author	Xiang, Y	en_US
dc.contributor.author	Chen, J	en_US
dc.date.issued	2013-12-25	en_US
dc.identifier.citation	Concurrency Computation Practice and Experience, 2013, 25 (18), pp. 2523 - 2539	en_US
dc.identifier.issn	1532-0626	en_US
dc.identifier.uri	http://hdl.handle.net/10453/26571
dc.description.abstract	Due to its advantages of cost-effectiveness, on-demand provisioning and easy for sharing, cloud computing has grown in popularity with the research community for deploying scientific applications such as workflows. Although such interests continue growing and scientific workflows are widely deployed in collaborative cloud environments that consist of a number of data centers, there is an urgent need for exploiting strategies which can place application datasets across globally distributed data centers and schedule tasks according to the data layout to reduce both latency and makespan for workflow execution. In this paper, by utilizing dependencies among datasets and tasks, we propose an efficient data and task coscheduling strategy that can place input datasets in a load balance way and meanwhile, group the mostly related datasets and tasks together. Moreover, data staging is used to overlap task execution with data transmission in order to shorten the start time of tasks. We build a simulation environment on Tianhe supercomputer for evaluating the proposed strategy and run simulations by random and realistic workflows. The results demonstrate that the proposed strategy can effectively improve scheduling performance while reducing the total volume of data transfer across data centers. Concurrency and Computation: Practice and Experience, 2013. © 2013 Wiley Periodicals, Inc.	en_US
dc.relation.ispartof	Concurrency Computation Practice and Experience	en_US
dc.relation.isbasedon	10.1002/cpe.3084	en_US
dc.subject.classification	Distributed Computing	en_US
dc.title	A clustering based coscheduling strategy for efficient scientific workflow execution in cloud computing	en_US
dc.type	Journal Article
utslib.citation.volume	18	en_US
utslib.citation.volume	25	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0803 Computer Software	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
utslib.copyright.status	closed_access
pubs.issue	18	en_US
pubs.publication-status	Published	en_US
pubs.volume	25	en_US

Abstract:

Due to its advantages of cost-effectiveness, on-demand provisioning and easy for sharing, cloud computing has grown in popularity with the research community for deploying scientific applications such as workflows. Although such interests continue growing and scientific workflows are widely deployed in collaborative cloud environments that consist of a number of data centers, there is an urgent need for exploiting strategies which can place application datasets across globally distributed data centers and schedule tasks according to the data layout to reduce both latency and makespan for workflow execution. In this paper, by utilizing dependencies among datasets and tasks, we propose an efficient data and task coscheduling strategy that can place input datasets in a load balance way and meanwhile, group the mostly related datasets and tasks together. Moreover, data staging is used to overlap task execution with data transmission in order to shorten the start time of tasks. We build a simulation environment on Tianhe supercomputer for evaluating the proposed strategy and run simulations by random and realistic workflows. The results demonstrate that the proposed strategy can effectively improve scheduling performance while reducing the total volume of data transfer across data centers. Concurrency and Computation: Practice and Experience, 2013. © 2013 Wiley Periodicals, Inc.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/26571