Data mining of classification for sybil user detection

Chinchore, Anand Arun

Data mining of classification for sybil user detection

Chinchore, Anand Arun

Permalink

Publication Type:: Thesis
Issue Date:: 2016

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (213.97 kB)

Adobe PDF

Download thesisAdobe PDF (1.36 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Chinchore, Anand Arun
dc.date.accessioned	2017-04-04T23:31:54Z
dc.date.available	2017-04-04T23:31:54Z
dc.date.issued	2016
dc.identifier.uri	http://hdl.handle.net/10453/89998
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	Data analytics and Big Data application research, along with new structures in complex data, are reveling the secrets of their own complexity and patterns with valuable and critical, but challenging, issues through newly designed tools, techniques and models in data science technology. A common example concerns the interconnectivity of social network users on mobiles, involving content and information sharing through mobile social networks. There have been a large number of studies on mobile networks. Many focus on a variety of secured applications that attempt to exploit social connections, impersonate users or attack social groups. Such applications are often created with the intention of collecting confidential information, laundering money, blackmail or to perform other criminal activities. Existing methods for identifying such activity, such as distributed systems, social graph-based sybil detection, behaviour classification, and local ranking systems that estimate the trust level between users, rely on the dependencies between random nodes of connection on mobile social networks. These models aim to detect suspicious connections and have the advantage of learning the relationships between nodes and data. However, their detection patterns tend to impose the behavioural patterns typically associated with community-based and external networks. In data mining, the graph-based and classification models used for pattern collection can accurately predict patterns in data in targeted categories. Decision trees, commonly used for classification, are trees in which each branch represents a choice between a number of alternatives, and each leaf represents a classification, or decision. For example, a decision tree may help an institution decide whether a node in a dataset is suspicious, or considered to be sybil, if a decision tree can be induced from a set of data about its instances and the - classifications of those instances. It could also provide the flexibility to demonstrate data distribution. Thus, researchers have tried to combine different techniques and methods into network-based models to detect various patterns generated by sybil nodes within a network. The purpose of this thesis is to abridge existing classification and regression techniques to identify sybil nodes, and the correlation of those nodes with time, to address these research limitations. Classification and regression techniques predict behaviour based on continuous or categorical responses. For example if the predicted response is continuous, then it is called a regression tree. If the response is categorical, it is called a classification tree. At each node of the tree, the value of one the connected input nodes is checked and a binary answer – yes or no – determines whether one continues to the left or right sub-branch. When a leaf is reached, a prediction follows from a series of entropy calculations and graphing techniques. This thesis introduces a novel classification model for sybil detection in mobile social behaviour that identifies dependencies using connection duration and other attributes. Roger Quinlan’s C4.5 algorithm, its resulting decision tree and a random forest simplify the step-by-step identification process, while maintaining its merits. Partial correlations between nodes are simplified using Rattle programming, and the dataset is divided into majority nodes to assist processing. This research also includes a behavioural survey of the nodes and an extended analysis using a classification system for sybil detection, with a particular focus on sybil attacks in mobile social network environments. Each sybil node is tracked and identified based on the frequency and duration of its connections with other nodes. An outline of how the classified model identifies behaviour is also included, along with an explanation of the flow of the decision tree and the C4.5 algorithm process, which press-gangs identified sybil nodes based on the results of entropy calculations and information gain. The calculated entropy for each node connection across the all datasets informs the information gain. The maxGain calculations for individual node bring the final stage of draw decision tree and helped to predict the sybil nodes, compare and justify the sybil attackers. These processes and new models applied to sybil detection provide insight into the behaviour of connections, through deep analytics and entropy gain. The evidence gleaned from this research brings significant knowledge to data analytics and data science in the identification of threats on mobile social networks.	en_AU
dc.format	Thesis (MAnalytics)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/89998/2/02whole.pdf
dc.rights	au.edu.uts.lib/ppc
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Mobile networks.	en
dc.subject	Interconnectivity of social network users on mobiles.	en
dc.subject	Content and information sharing.	en
dc.subject	Behavioural patterns.	en
dc.subject	Data mining.	en
dc.subject	Sybil detection.	en
dc.subject	Decision trees.	en
dc.title	Data mining of classification for sybil user detection	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

Data analytics and Big Data application research, along with new structures in complex data, are reveling the secrets of their own complexity and patterns with valuable and critical, but challenging, issues through newly designed tools, techniques and models in data science technology. A common example concerns the interconnectivity of social network users on mobiles, involving content and information sharing through mobile social networks. There have been a large number of studies on mobile networks. Many focus on a variety of secured applications that attempt to exploit social connections, impersonate users or attack social groups. Such applications are often created with the intention of collecting confidential information, laundering money, blackmail or to perform other criminal activities. Existing methods for identifying such activity, such as distributed systems, social graph-based sybil detection, behaviour classification, and local ranking systems that estimate the trust level between users, rely on the dependencies between random nodes of connection on mobile social networks. These models aim to detect suspicious connections and have the advantage of learning the relationships between nodes and data. However, their detection patterns tend to impose the behavioural patterns typically associated with community-based and external networks. In data mining, the graph-based and classification models used for pattern collection can accurately predict patterns in data in targeted categories. Decision trees, commonly used for classification, are trees in which each branch represents a choice between a number of alternatives, and each leaf represents a classification, or decision. For example, a decision tree may help an institution decide whether a node in a dataset is suspicious, or considered to be sybil, if a decision tree can be induced from a set of data about its instances and the - classifications of those instances. It could also provide the flexibility to demonstrate data distribution. Thus, researchers have tried to combine different techniques and methods into network-based models to detect various patterns generated by sybil nodes within a network. The purpose of this thesis is to abridge existing classification and regression techniques to identify sybil nodes, and the correlation of those nodes with time, to address these research limitations. Classification and regression techniques predict behaviour based on continuous or categorical responses. For example if the predicted response is continuous, then it is called a regression tree. If the response is categorical, it is called a classification tree. At each node of the tree, the value of one the connected input nodes is checked and a binary answer – yes or no – determines whether one continues to the left or right sub-branch. When a leaf is reached, a prediction follows from a series of entropy calculations and graphing techniques. This thesis introduces a novel classification model for sybil detection in mobile social behaviour that identifies dependencies using connection duration and other attributes. Roger Quinlan’s C4.5 algorithm, its resulting decision tree and a random forest simplify the step-by-step identification process, while maintaining its merits. Partial correlations between nodes are simplified using Rattle programming, and the dataset is divided into majority nodes to assist processing. This research also includes a behavioural survey of the nodes and an extended analysis using a classification system for sybil detection, with a particular focus on sybil attacks in mobile social network environments. Each sybil node is tracked and identified based on the frequency and duration of its connections with other nodes. An outline of how the classified model identifies behaviour is also included, along with an explanation of the flow of the decision tree and the C4.5 algorithm process, which press-gangs identified sybil nodes based on the results of entropy calculations and information gain. The calculated entropy for each node connection across the all datasets informs the information gain. The maxGain calculations for individual node bring the final stage of draw decision tree and helped to predict the sybil nodes, compare and justify the sybil attackers. These processes and new models applied to sybil detection provide insight into the behaviour of connections, through deep analytics and entropy gain. The evidence gleaned from this research brings significant knowledge to data analytics and data science in the identification of threats on mobile social networks.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/89998