Mining maximal quasi-bicliques: Novel algorithm and applications in the stock market and protein networks
- Publication Type:
- Journal Article
- Statistical Analysis and Data Mining, 2009, 2 (4), pp. 255 - 273
- Issue Date:
Several real-world applications require mining of bicliques, as they represent correlated pairs of data clusters. However, the mining quality is adversely affected by missing and noisy data. Moreover, some applications only require strong interactions between data members of the pairs, but bicliques are pairs that display complete interactions. We address these two limitations by proposing maximal quasi-bicliques. Maximal quasi-bicliques tolerate erroneous and missing data, and also relax the interactions between the data members of their pairs. Besides, maximal quasi-bicliques do not suffer from skewed distribution of missing edges that prior quasi-bicliques have. We develop an algorithm MQBminer, which mines the complete set of maximal quasi-bicliques from either bipartite or non-bipartite graphs. We demonstrate the versatility and effectiveness of maximal quasi-bicliques to discover highly correlated pairs of data in two diverse real-world datasets. First, we propose to solve a novel financial stocks analysis problem using maximal quasi-bicliques to co-cluster stocks and financial ratios. Results show that the stocks in our co-clusters usually have significant correlations in their price performance. Second, we use maximal quasi-bicliques on a mining protein network problem and we show that pairs of protein groups mined by maximal quasi-bicliques are more significant than those mined by maximal bicliques. © 2009 Wiley Periodicals, Inc.
Please use this identifier to cite or link to this item: