Scalable processing methods for host-based intrusion detection systems

Publication Type:
Thesis
Issue Date:
2019
Full metadata record
Host-based intrusion detection system (HIDS) is renowned for the fine-grained analysis and the capability of discovering internal malicious behaviors. HIDS monitors logs from operating systems, whereas network-based intrusion detection system (NIDS) focuses on the data flow of network traffic. In a contemporary data center, Linux applications often generate a large quantity of real-time system call traces, which are not suitable for traditional host-based intrusion detection system deployed on every single host. Training data mining models with system calls on a single host that has static computing and storage capacity is time-consuming and intermediate datasets are not capable of being efficiently handled. It is cumbersome for the maintenance and update of HIDS installed on every physical or virtual host, and comprehensive system call analysis can hardly be performed to detect complex and distributed attacks among multiple hosts. First, considering these limitations of current system call-based HIDS, this thesis provides a review of the development of system call-based HIDS. Algorithms and techniques relevant to system call-based HIDS are investigated, including feature extraction methods and various data mining algorithms. The HIDS dataset issues are discussed, including currently available datasets with system calls and approaches for researchers to generate new datasets. Modern application of system call-based HIDS on embedded systems is summarized, and related works are investigated. Second, this thesis forecasts the future research trends of HIDS regarding three aspects, namely, the reduction of false positive rate, the improvement of detection efficiency, and the enhancement of collaborative security; then a real-time scalable HIDS framework with big data tools in cloud for a data center is proposed to enhance the collaborative security. The framework is comprised of three layers, namely, data collection layer, data analytics layer, and data storage layer. The framework is deployed in an open-source private cloud computing environment, and this framework is easily scalable to fulfill the requirement of new hosts set up in the data center. Third, this thesis presents SCADS, a corresponding scalable HIDS solution using Apache Spark in the Google cloud environment. A set of Spark algorithms are developed to achieve the computational scalability. The experimental results demonstrate that the efficiency of intrusion detection can be enhanced, which indicates that the proposed method is applicable to the design of next-generation host-based intrusion detection systems with system calls. Fourth, in the current industry, there are two significant improvements about HIDS, i.e., the integration with other security capabilities, and the combination of the latest threat intelligence (CTI). Therefore, to design a comprehensive HIDS under the current sophisticated threat environment, traditional HIDS should combine with other security capabilities and the latest CTI. The key component of CTI is the sharing of threat information. This thesis briefly introduces the cyber threat intelligence and the threat information sharing; and proposes a scalable real-time threat information sharing framework in cloud, based on some recently leading platforms, such as MITRE TAXII, IBM X-Force, WEBROOT BrightCloud, EclecticIQ platform, and AlienVault OTX platform. Fifth, this thesis provides a private and scalable online virus detection system. The system is expected to be integrated into the forecast scalable real-time threat information sharing system, which is helpful to the design of a comprehensive HIDS. The system incorporates multiple anti-virus engines and a web interface. The proposed system can perform the "isolated detection and update", which guarantees that the uploaded confidential samples are not exposed to the Internet, during either virus detection or system upgrade. Furthermore, the low-coupling design of this system is scalable to support the distributed deployment mode. The system is tested with benign files, EICAR (European Institute for Computer Antivirus Research) Standard Anti-Virus Test File, and other suspicious samples. The system testing results demonstrate that the proposed mechanisms are pragmatic.
Please use this identifier to cite or link to this item: