Scalable processing methods for host-based intrusion detection systems

Liu, Ming

Scalable processing methods for host-based intrusion detection systems

Liu, Ming

Permalink

Publication Type:: Thesis
Issue Date:: 2019

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (345.06 kB)

Adobe PDF

Download thesisAdobe PDF (2.05 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Liu, Ming
dc.date.accessioned	2019-05-09T22:11:33Z
dc.date.available	2019-05-09T22:11:33Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/10453/133276
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	Host-based intrusion detection system (HIDS) is renowned for the fine-grained analysis and the capability of discovering internal malicious behaviors. HIDS monitors logs from operating systems, whereas network-based intrusion detection system (NIDS) focuses on the data flow of network traffic. In a contemporary data center, Linux applications often generate a large quantity of real-time system call traces, which are not suitable for traditional host-based intrusion detection system deployed on every single host. Training data mining models with system calls on a single host that has static computing and storage capacity is time-consuming and intermediate datasets are not capable of being efficiently handled. It is cumbersome for the maintenance and update of HIDS installed on every physical or virtual host, and comprehensive system call analysis can hardly be performed to detect complex and distributed attacks among multiple hosts. First, considering these limitations of current system call-based HIDS, this thesis provides a review of the development of system call-based HIDS. Algorithms and techniques relevant to system call-based HIDS are investigated, including feature extraction methods and various data mining algorithms. The HIDS dataset issues are discussed, including currently available datasets with system calls and approaches for researchers to generate new datasets. Modern application of system call-based HIDS on embedded systems is summarized, and related works are investigated. Second, this thesis forecasts the future research trends of HIDS regarding three aspects, namely, the reduction of false positive rate, the improvement of detection efficiency, and the enhancement of collaborative security; then a real-time scalable HIDS framework with big data tools in cloud for a data center is proposed to enhance the collaborative security. The framework is comprised of three layers, namely, data collection layer, data analytics layer, and data storage layer. The framework is deployed in an open-source private cloud computing environment, and this framework is easily scalable to fulfill the requirement of new hosts set up in the data center. Third, this thesis presents SCADS, a corresponding scalable HIDS solution using Apache Spark in the Google cloud environment. A set of Spark algorithms are developed to achieve the computational scalability. The experimental results demonstrate that the efficiency of intrusion detection can be enhanced, which indicates that the proposed method is applicable to the design of next-generation host-based intrusion detection systems with system calls. Fourth, in the current industry, there are two significant improvements about HIDS, i.e., the integration with other security capabilities, and the combination of the latest threat intelligence (CTI). Therefore, to design a comprehensive HIDS under the current sophisticated threat environment, traditional HIDS should combine with other security capabilities and the latest CTI. The key component of CTI is the sharing of threat information. This thesis briefly introduces the cyber threat intelligence and the threat information sharing; and proposes a scalable real-time threat information sharing framework in cloud, based on some recently leading platforms, such as MITRE TAXII, IBM X-Force, WEBROOT BrightCloud, EclecticIQ platform, and AlienVault OTX platform. Fifth, this thesis provides a private and scalable online virus detection system. The system is expected to be integrated into the forecast scalable real-time threat information sharing system, which is helpful to the design of a comprehensive HIDS. The system incorporates multiple anti-virus engines and a web interface. The proposed system can perform the "isolated detection and update", which guarantees that the uploaded confidential samples are not exposed to the Internet, during either virus detection or system upgrade. Furthermore, the low-coupling design of this system is scalable to support the distributed deployment mode. The system is tested with benign files, EICAR (European Institute for Computer Antivirus Research) Standard Anti-Virus Test File, and other suspicious samples. The system testing results demonstrate that the proposed mechanisms are pragmatic.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/133276/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Intrusion detection system.
dc.subject	Linux.
dc.subject	Data mining.
dc.subject	Embedded system.
dc.subject	Apache Spark.
dc.subject	Antivirus software.
dc.title	Scalable processing methods for host-based intrusion detection systems	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

Host-based intrusion detection system (HIDS) is renowned for the fine-grained analysis and the capability of discovering internal malicious behaviors. HIDS monitors logs from operating systems, whereas network-based intrusion detection system (NIDS) focuses on the data flow of network traffic. In a contemporary data center, Linux applications often generate a large quantity of real-time system call traces, which are not suitable for traditional host-based intrusion detection system deployed on every single host. Training data mining models with system calls on a single host that has static computing and storage capacity is time-consuming and intermediate datasets are not capable of being efficiently handled. It is cumbersome for the maintenance and update of HIDS installed on every physical or virtual host, and comprehensive system call analysis can hardly be performed to detect complex and distributed attacks among multiple hosts. First, considering these limitations of current system call-based HIDS, this thesis provides a review of the development of system call-based HIDS. Algorithms and techniques relevant to system call-based HIDS are investigated, including feature extraction methods and various data mining algorithms. The HIDS dataset issues are discussed, including currently available datasets with system calls and approaches for researchers to generate new datasets. Modern application of system call-based HIDS on embedded systems is summarized, and related works are investigated. Second, this thesis forecasts the future research trends of HIDS regarding three aspects, namely, the reduction of false positive rate, the improvement of detection efficiency, and the enhancement of collaborative security; then a real-time scalable HIDS framework with big data tools in cloud for a data center is proposed to enhance the collaborative security. The framework is comprised of three layers, namely, data collection layer, data analytics layer, and data storage layer. The framework is deployed in an open-source private cloud computing environment, and this framework is easily scalable to fulfill the requirement of new hosts set up in the data center. Third, this thesis presents SCADS, a corresponding scalable HIDS solution using Apache Spark in the Google cloud environment. A set of Spark algorithms are developed to achieve the computational scalability. The experimental results demonstrate that the efficiency of intrusion detection can be enhanced, which indicates that the proposed method is applicable to the design of next-generation host-based intrusion detection systems with system calls. Fourth, in the current industry, there are two significant improvements about HIDS, i.e., the integration with other security capabilities, and the combination of the latest threat intelligence (CTI). Therefore, to design a comprehensive HIDS under the current sophisticated threat environment, traditional HIDS should combine with other security capabilities and the latest CTI. The key component of CTI is the sharing of threat information. This thesis briefly introduces the cyber threat intelligence and the threat information sharing; and proposes a scalable real-time threat information sharing framework in cloud, based on some recently leading platforms, such as MITRE TAXII, IBM X-Force, WEBROOT BrightCloud, EclecticIQ platform, and AlienVault OTX platform. Fifth, this thesis provides a private and scalable online virus detection system. The system is expected to be integrated into the forecast scalable real-time threat information sharing system, which is helpful to the design of a comprehensive HIDS. The system incorporates multiple anti-virus engines and a web interface. The proposed system can perform the "isolated detection and update", which guarantees that the uploaded confidential samples are not exposed to the Internet, during either virus detection or system upgrade. Furthermore, the low-coupling design of this system is scalable to support the distributed deployment mode. The system is tested with benign files, EICAR (European Institute for Computer Antivirus Research) Standard Anti-Virus Test File, and other suspicious samples. The system testing results demonstrate that the proposed mechanisms are pragmatic.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/133276