Parallel Processing Systems for Big Data: A Survey

Zhang, Y; Cao, T; Li, S; Tian, X; Yuan, L; Jia, H; Vasilakos, AV

Parallel Processing Systems for Big Data: A Survey

Zhang, Y Cao, T Li, S Tian, X Yuan, L Jia, H Vasilakos, AV

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: Proceedings of the IEEE, 2016, 104, (11), pp. 2114-2136
Issue Date:: 2016-11-01

Closed Access

	Filename	Description	Size
	Parallel_Processing_Systems_for_Big_Data_A_Survey.pdf	Published version	1.38 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Y
dc.contributor.author	Cao, T
dc.contributor.author	Li, S
dc.contributor.author	Tian, X
dc.contributor.author	Yuan, L
dc.contributor.author	Jia, H
dc.contributor.author	Vasilakos, AV
dc.date.accessioned	2022-08-10T04:44:45Z
dc.date.available	2022-08-10T04:44:45Z
dc.date.issued	2016-11-01
dc.identifier.citation	Proceedings of the IEEE, 2016, 104, (11), pp. 2114-2136
dc.identifier.issn	0018-9219
dc.identifier.issn	1558-2256
dc.identifier.uri	http://hdl.handle.net/10453/159847
dc.description.abstract	The volume, variety, and velocity properties of big data and the valuable information it contains have motivated the investigation of many new parallel data processing systems in addition to the approaches using traditional database management systems (DBMSs). MapReduce pioneered this paradigm change and rapidly became the primary big data processing system for its simplicity, scalability, and fine-grain fault tolerance. However, compared with DBMSs, MapReduce also arouses controversy in processing efficiency, low-level abstraction, and rigid dataflow. Inspired by MapReduce, nowadays the big data systems are blooming. Some of them follow MapReduce's idea, but with more flexible models for general-purpose usage. Some absorb the advantages of DBMSs with higher abstraction. There are also specific systems for certain applications, such as machine learning and stream data processing. To explore new research opportunities and assist users in selecting suitable processing systems for specific applications, this survey paper will give a high-level overview of the existing parallel data processing systems categorized by the data input as batch processing, stream processing, graph processing, and machine learning processing and introduce representative projects in each category. As the pioneer, the original MapReduce system, as well as its active variants and extensions on dataflow, data access, parameter tuning, communication, and energy optimizations will be discussed at first. System benchmarks and open issues for big data processing will also be studied in this survey.
dc.language	English
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof	Proceedings of the IEEE
dc.relation.isbasedon	10.1109/JPROC.2016.2591592
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0903 Biomedical Engineering, 0906 Electrical and Electronic Engineering
dc.title	Parallel Processing Systems for Big Data: A Survey
dc.type	Journal Article
utslib.citation.volume	104
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0903 Biomedical Engineering
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
dc.date.updated	2022-08-10T04:44:44Z
pubs.issue	11
pubs.publication-status	Published
pubs.volume	104
utslib.citation.issue	11

Abstract:

The volume, variety, and velocity properties of big data and the valuable information it contains have motivated the investigation of many new parallel data processing systems in addition to the approaches using traditional database management systems (DBMSs). MapReduce pioneered this paradigm change and rapidly became the primary big data processing system for its simplicity, scalability, and fine-grain fault tolerance. However, compared with DBMSs, MapReduce also arouses controversy in processing efficiency, low-level abstraction, and rigid dataflow. Inspired by MapReduce, nowadays the big data systems are blooming. Some of them follow MapReduce's idea, but with more flexible models for general-purpose usage. Some absorb the advantages of DBMSs with higher abstraction. There are also specific systems for certain applications, such as machine learning and stream data processing. To explore new research opportunities and assist users in selecting suitable processing systems for specific applications, this survey paper will give a high-level overview of the existing parallel data processing systems categorized by the data input as batch processing, stream processing, graph processing, and machine learning processing and introduce representative projects in each category. As the pioneer, the original MapReduce system, as well as its active variants and extensions on dataflow, data access, parameter tuning, communication, and energy optimizations will be discussed at first. System benchmarks and open issues for big data processing will also be studied in this survey.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/159847