A highly practical approach toward achieving minimum data sets storage cost in the cloud

Yuan, D; Yang, Y; Liu, X; Li, W; Cui, L; Xu, M; Chen, J

A highly practical approach toward achieving minimum data sets storage cost in the cloud

Yuan, D Yang, Y Liu, X Li, W Cui, L Xu, M Chen, J

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Parallel and Distributed Systems, 2013, 24 (6), pp. 1234 - 1244
Issue Date:: 2013-05-20

Closed Access

	Filename	Description	Size
	06410317.pdf	Published Version	1.34 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yuan, D	en_US
dc.contributor.author	Yang, Y	en_US
dc.contributor.author	Liu, X	en_US
dc.contributor.author	Li, W	en_US
dc.contributor.author	Cui, L	en_US
dc.contributor.author	Xu, M	en_US
dc.contributor.author	Chen, J	en_US
dc.date.issued	2013-05-20	en_US
dc.identifier.citation	IEEE Transactions on Parallel and Distributed Systems, 2013, 24 (6), pp. 1234 - 1244	en_US
dc.identifier.issn	1045-9219	en_US
dc.identifier.uri	http://hdl.handle.net/10453/32676
dc.description.abstract	Massive computation power and storage capacity of cloud computing systems allow scientists to deploy computation and data intensive applications without infrastructure investment, where large application data sets can be stored in the cloud. Based on the pay-as-you-go model, storage strategies and benchmarking approaches have been developed for cost-effectively storing large volume of generated application data sets in the cloud. However, they are either insufficiently cost-effective for the storage or impractical to be used at runtime. In this paper, toward achieving the minimum cost benchmark, we propose a novel highly cost-effective and practical storage strategy that can automatically decide whether a generated data set should be stored or not at runtime in the cloud. The main focus of this strategy is the local-optimization for the tradeoff between computation and storage, while secondarily also taking users' (optional) preferences on storage into consideration. Both theoretical analysis and simulations conducted on general (random) data sets as well as specific real world applications with Amazon's cost model show that the cost-effectiveness of our strategy is close to or even the same as the minimum cost benchmark, and the efficiency is very high for practical runtime utilization in the cloud. © 1990-2012 IEEE.	en_US
dc.relation.ispartof	IEEE Transactions on Parallel and Distributed Systems	en_US
dc.relation.isbasedon	10.1109/TPDS.2013.20	en_US
dc.subject.classification	Distributed Computing	en_US
dc.title	A highly practical approach toward achieving minimum data sets storage cost in the cloud	en_US
dc.type	Journal Article
utslib.citation.volume	6	en_US
utslib.citation.volume	24	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0803 Computer Software	en_US
utslib.for	1005 Communications Technologies	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
utslib.copyright.status	closed_access
pubs.issue	6	en_US
pubs.publication-status	Published	en_US
pubs.volume	24	en_US

Abstract:

Massive computation power and storage capacity of cloud computing systems allow scientists to deploy computation and data intensive applications without infrastructure investment, where large application data sets can be stored in the cloud. Based on the pay-as-you-go model, storage strategies and benchmarking approaches have been developed for cost-effectively storing large volume of generated application data sets in the cloud. However, they are either insufficiently cost-effective for the storage or impractical to be used at runtime. In this paper, toward achieving the minimum cost benchmark, we propose a novel highly cost-effective and practical storage strategy that can automatically decide whether a generated data set should be stored or not at runtime in the cloud. The main focus of this strategy is the local-optimization for the tradeoff between computation and storage, while secondarily also taking users' (optional) preferences on storage into consideration. Both theoretical analysis and simulations conducted on general (random) data sets as well as specific real world applications with Amazon's cost model show that the cost-effectiveness of our strategy is close to or even the same as the minimum cost benchmark, and the efficiency is very high for practical runtime utilization in the cloud. © 1990-2012 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/32676