Efficient identification of local keyword patterns in microblogging platforms
- Publication Type:
- Journal Article
- IEEE Transactions on Knowledge and Data Engineering, 2016, 28 (10), pp. 2621 - 2634
- Issue Date:
© 1989-2012 IEEE. Microblogging platforms, such as Twitter, serve as an important and efficient channel for sharing information. With the prevalence of geo-position enabled devices, a rapidly growing amount of microblogs are associated with geo-tags. Consequently, real-time analysis of the geo-tagged microblog stream has attracted great attentions. In this paper, we advocate the significance of keyword co-occurrence for geo-tagged microblogs analysis, which has been overlooked by existing studies. The co-occurrence of keywords is necessary to resolve the ambiguity in event analysis, especially when different events have overlapping descriptions. Given a geo-tagged microblog stream, we formally define the problem of identifying local (top-K ) maximal frequent keyword co-occurrence patterns over geo-tagged microblog stream, namely LFP (LKFP) query. Given a query region, LFP query aims to retrieve the local maximal keyword patterns with frequency exceeding a given threshold; while LKFP query aims to identify K maximal keyword patterns with highest local frequency, in case users do not have a threshold in mind. To handle the high volume microblog stream and meet the requirement when a large number of queries are issued, we develop novel data structures to maintain the data stream, and propose efficient algorithms to process LFP and LKFP queries with theoretical underpinnings. The extensive empirical study on real dataset confirms the effectiveness and efficiency of our approaches.
Please use this identifier to cite or link to this item: