Computation and Storage Trade-Off for Cost-Effectively Storing Scientific Datasets in the Cloud

Yuan, D; Yang, Y; Liu, X; Chen, J

Computation and Storage Trade-Off for Cost-Effectively Storing Scientific Datasets in the Cloud

Yuan, D Yang, Y Liu, X Chen, J

Permalink

Publisher:: Springer
Publication Type:: Chapter
Citation:: Handbook of Data Intensive Computing, 2011, 1st edition, pp. 129 - 153
Issue Date:: 2011-01

Closed Access

	Filename	Description	Size
	2012001278OK_Yuan.pdf		757.96 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yuan, D	en_US
dc.contributor.author	Yang, Y	en_US
dc.contributor.author	Liu, X	en_US
dc.contributor.author	Chen, J	en_US
dc.contributor.editor	Furht, B	en_US
dc.contributor.editor	Escalante, A	en_US
dc.date.issued	2011-01	en_US
dc.identifier.citation	Handbook of Data Intensive Computing, 2011, 1st edition, pp. 129 - 153	en_US
dc.identifier.isbn	9781461414148	en_US
dc.identifier.uri	http://hdl.handle.net/10453/26296
dc.description.abstract	Scientific applications are usually data intensive [1,~ 2], where the generated datasets are often terabytes or even petabytes in size. As reported by Szalay and Gray in [3], science is in an exponential world and the amount of scientific data will double every year over the next decade and future. Producing scientific datasets involves large number of computation intensive tasks, e.g., with scientific workflows [4], hence taking a long time for execution. These generated datasets contain important intermediate or final results of the computation, and need to be stored as valuable resources. This is because: (1) data can be reused scientists may need to re-analyze the results or apply new analyses on the existing datasets [5]; (2) data can be shared for collaboration, the computation results may be shared, hence the datasets are used by scientists from different institutions [6]. Storing valuable generated application datasets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific datasets is a big challenge for their storage	en_US
dc.publisher	Springer	en_US
dc.relation.ispartof	Handbook of Data Intensive Computing	en_US
dc.relation.isbasedon	10.1007/978-1-4614-1415-5_5	en_US
dc.title	Computation and Storage Trade-Off for Cost-Effectively Storing Scientific Datasets in the Cloud	en_US
dc.type	Chapter
utslib.location	Newark, NJ 07101-3301, USA	en_US
utslib.for	080609 Information Systems Management	en_US
utslib.for	080501 Distributed and Grid Systems	en_US
utslib.citation.edition	1st edition	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
utslib.copyright.status	closed_access
pubs.consider-herdc	false	en_US
pubs.edition	1st edition	en_US
pubs.place-of-publication	Newark, NJ 07101-3301, USA	en_US

Abstract:

Scientific applications are usually data intensive [1,~ 2], where the generated datasets are often terabytes or even petabytes in size. As reported by Szalay and Gray in [3], science is in an exponential world and the amount of scientific data will double every year over the next decade and future. Producing scientific datasets involves large number of computation intensive tasks, e.g., with scientific workflows [4], hence taking a long time for execution. These generated datasets contain important intermediate or final results of the computation, and need to be stored as valuable resources. This is because: (1) data can be reused scientists may need to re-analyze the results or apply new analyses on the existing datasets [5]; (2) data can be shared for collaboration, the computation results may be shared, hence the datasets are used by scientists from different institutions [6]. Storing valuable generated application datasets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific datasets is a big challenge for their storage

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/26296