Parallelization and optimization of spatial analysis for large scale environmental model data assembly

Zhao, G; Bryan, BA; King, D; Song, X; Yu, Q

Parallelization and optimization of spatial analysis for large scale environmental model data assembly

Zhao, G Bryan, BA King, D Song, X Yu, Q

Permalink

Publication Type:: Journal Article
Citation:: Computers and Electronics in Agriculture, 2012, 89 pp. 94 - 99
Issue Date:: 2012-11-01

Closed Access

	Filename	Description	Size
	2012001461OK.pdf		1.32 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, G	en_US
dc.contributor.author	Bryan, BA	en_US
dc.contributor.author	King, D	en_US
dc.contributor.author	Song, X	en_US
dc.contributor.author	Yu, Q https://orcid.org/0000-0001-6950-1821	en_US
dc.date.issued	2012-11-01	en_US
dc.identifier.citation	Computers and Electronics in Agriculture, 2012, 89 pp. 94 - 99	en_US
dc.identifier.issn	0168-1699	en_US
dc.identifier.uri	http://hdl.handle.net/10453/29551
dc.description.abstract	Spatial-temporal modelling of environmental systems such as agriculture, forestry, and water resources requires high resolution input data. Assembling and summarizing this data in the appropriate format for model input often requires a series of spatial analyses which can be extremely time-consuming, especially when many large data sets are involved. In this paper we investigated the ability of high-performance computing techniques to improve the efficiency of spatial analysis for model data assembly. We implemented an array-based algorithm to calculate summary statistics for long time-series daily grid climate data sets for 11,575 climate-soil zones across the Australian wheat-growing regions for input into a crop simulation model. We developed a zonal statistics algorithm using Python's Numpy module then parallelized it and processed it using a shared memory, multi-processor system. We assessed algorithm performance with a varying number of CPU cores, and assessed the influence of load balancing on the efficiency of parallel processing. Compared with traditional desktop GIS software, the serial and parallel (32 cores) implementation achieved about 180 and 1440 times speed-up, respectively. We also found that the most efficient computation occurred when not all of the available CPU cores were used, and the chunk size of jobs also had an important influence on computing efficiency. The algorithm and the parallel processing scheme provides a useful approach to address computing challenges posed by spatial analysis of numerous large data sets for large scale environmental modelling. © 2012 Elsevier B.V.	en_US
dc.relation.ispartof	Computers and Electronics in Agriculture	en_US
dc.relation.isbasedon	10.1016/j.compag.2012.08.007	en_US
dc.subject.classification	Agronomy & Agriculture	en_US
dc.title	Parallelization and optimization of spatial analysis for large scale environmental model data assembly	en_US
dc.type	Journal Article
utslib.citation.volume	89	en_US
utslib.for	0701 Agriculture, Land and Farm Management	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	07 Agricultural and Veterinary Sciences	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Life Sciences
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	89	en_US

Abstract:

Spatial-temporal modelling of environmental systems such as agriculture, forestry, and water resources requires high resolution input data. Assembling and summarizing this data in the appropriate format for model input often requires a series of spatial analyses which can be extremely time-consuming, especially when many large data sets are involved. In this paper we investigated the ability of high-performance computing techniques to improve the efficiency of spatial analysis for model data assembly. We implemented an array-based algorithm to calculate summary statistics for long time-series daily grid climate data sets for 11,575 climate-soil zones across the Australian wheat-growing regions for input into a crop simulation model. We developed a zonal statistics algorithm using Python's Numpy module then parallelized it and processed it using a shared memory, multi-processor system. We assessed algorithm performance with a varying number of CPU cores, and assessed the influence of load balancing on the efficiency of parallel processing. Compared with traditional desktop GIS software, the serial and parallel (32 cores) implementation achieved about 180 and 1440 times speed-up, respectively. We also found that the most efficient computation occurred when not all of the available CPU cores were used, and the chunk size of jobs also had an important influence on computing efficiency. The algorithm and the parallel processing scheme provides a useful approach to address computing challenges posed by spatial analysis of numerous large data sets for large scale environmental modelling. © 2012 Elsevier B.V.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/29551