Aggregation in Regression Analysis of Very Large Time Series Datasets

Malecki, Alan Andrew

Aggregation in Regression Analysis of Very Large Time Series Datasets

Malecki, Alan Andrew

Permalink

Publication Type:: Thesis
Issue Date:: 2020

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (194.26 kB)

Adobe PDF

Download thesisAdobe PDF (1.24 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Malecki, Alan Andrew
dc.date.accessioned	2020-08-24T04:36:03Z
dc.date.available	2020-08-24T04:36:03Z
dc.date.issued	2020
dc.identifier.uri	http://hdl.handle.net/10453/142317
dc.description	University of Technology Sydney. Faculty of Science.	en_AU
dc.description.abstract	The focus of this thesis is on the analysis of large and complex data. Computer memory constraints can prohibit the analysis of large datasets, and this issue is further complicated when faced with complex data. We are motivated by an environmental dataset concerning air particulate measurements and the impact of passing coal transport trains. This dataset has over 600,000 observations and is complicated by it’s long memory dependence. Current methods for long memory time series are limited to small datasets. To overcome these issues, we consider two approaches for the analysis of large and complex data: 1. transforming data such that its volume and complexity is reduced, and, 2. extending current statistical methods for big data to allow for complex data structures. The use of temporal aggregation transforms the dataset to a more manageable size. This permits the use of an AutoRegressive Fractionally Integrated Moving Average (ARFIMA) process on our motivating dataset. We also consider transforming the data to a bivariate series to reduce the loss of information due to this temporal aggregation. Divide and Recombine is a modern approach to analysing big data. This approach for big data analysis has not yet been extended to the time series setting. We explore this situation and extend the D&R process for long memory time series.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/142317/2/02whole.pdf
dc.rights	au.edu.uts.lib/ppc
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Aggregation in Regression Analysis of Very Large Time Series Datasets	en_AU
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

The focus of this thesis is on the analysis of large and complex data. Computer memory constraints can prohibit the analysis of large datasets, and this issue is further complicated when faced with complex data. We are motivated by an environmental dataset concerning air particulate measurements and the impact of passing coal transport trains. This dataset has over 600,000 observations and is complicated by it’s long memory dependence. Current methods for long memory time series are limited to small datasets. To overcome these issues, we consider two approaches for the analysis of large and complex data: 1. transforming data such that its volume and complexity is reduced, and, 2. extending current statistical methods for big data to allow for complex data structures. The use of temporal aggregation transforms the dataset to a more manageable size. This permits the use of an AutoRegressive Fractionally Integrated Moving Average (ARFIMA) process on our motivating dataset. We also consider transforming the data to a bivariate series to reduce the loss of information due to this temporal aggregation. Divide and Recombine is a modern approach to analysing big data. This approach for big data analysis has not yet been extended to the time series setting. We explore this situation and extend the D&R process for long memory time series.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/142317