Concept Drift Adaptation for Real-time Prediction

Publication Type:
Thesis
Issue Date:
2020
Full metadata record
Concept drift refers to the phenomenon of distribution changes in a data stream. Using concept drift adaptation techniques to predict the target variable(s) of real-time data streams has gained the ever-increasing attention of researchers in recent years. This research aims to develop a set of concept drift adaptation methods for predicting the target variable of real-time data streams. The literature review reveals two issues in the area of concept drift: i) how the concept drift problem limits the learning capability; ii) how to make adaptation in more realistic scenarios that data streams have uncertainties other than concept drift. To address the issue i), this research discovers three root causes of limited learning capability when concept drift occurs. It is found that when concept drift occurs in a data stream, the prediction accuracy is decreased because 1) the training set contains more than one patterns so that the predictor cannot be well-learned; 2) a newly arrived data instance may present old patterns but an old instance presents the new pattern; and 3) few data instances are available when a new concept is identified at its early stage. Three concept drift adaptation methods are designed to address the three situations separately. Situation 1) is solved by developing a 𝘧𝘶𝘻𝘻𝘺 𝘤𝘭𝘶𝘴𝘵𝘦𝘳𝘪𝘯𝘨-𝘣𝘢𝘴𝘦𝘥 𝘢𝘥𝘢𝘱𝘵𝘪𝘷𝘦 𝘳𝘦𝘨𝘳𝘦𝘴𝘴𝘪𝘰𝘯 (FUZZ-CARE) approach. FUZZ-CARE can learn how many patterns exist in the training set and the membership degree of each instance belonging to each pattern; To learn the predictor with the most relevant data rather than the newest arrived data, a 𝘴𝘦𝘨𝘮𝘦𝘯𝘵-𝘣𝘢𝘴𝘦𝘥 𝘥𝘳𝘪𝘧𝘵 𝘢𝘥𝘢𝘱𝘵𝘢𝘵𝘪𝘰𝘯 (SEGA) method to sequentially pick out the best segments in the training data to update the predictors. This addresses the situation 2). An 𝘢𝘥𝘢𝘱𝘵𝘪𝘷𝘦 𝘧𝘶𝘻𝘻𝘺 𝘯𝘦𝘵𝘸𝘰𝘳𝘬 (AFN) is designed to address the situation 3) through generating samples of the new concept with the previous data instances. To address the issue ii), this research discusses the concept drift phenomenon under two scenarios that are more realistic. One is to solve the concept drift problem when data is noisy. A 𝘯𝘰𝘪𝘴𝘦-𝘵𝘰𝘭𝘦𝘳𝘢𝘯𝘵 𝘥𝘳𝘪𝘧𝘵 𝘢𝘥𝘢𝘱𝘵𝘢𝘵𝘪𝘰𝘯 (NoA) method is designed for handling concept drift when the data stream contains signal noise; the other is to solve the concept drift problem when data also contains temporal dependency. A theoretical study is conducted for the regression of data streams with concept drift and temporal dependency, and based on this study, a 𝘥𝘳𝘪𝘧𝘵-𝘢𝘥𝘢𝘱𝘵𝘦𝘥 𝘳𝘦𝘨𝘳𝘦𝘴𝘴𝘪𝘰𝘯 (DAR) framework is established. To conclude, this thesis not only provides a set of effective drift adaptation methods for real-time prediction, but also contributes to the development of concept drift area.
Please use this identifier to cite or link to this item: