Active exploration: simultaneous sampling and labeling for large graphs

Publication Type:
Conference Proceeding
The 22nd ACM International Conference on Information and Knowledge Management, 2013, pp. 829 - 834
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2013003033OK.pdf480.93 kB
Adobe PDF
Modern information networks, such as social networks, are often characterized with large sizes and dynamic changing structures. To analyze these networks, existing solutions commonly rely on graph sampling techniques to reduce network sizes, and then carry out succeeding mining processes, such as labeling network nodes to build classification models. Such a sampling-then-labeling paradigm assumes that the whole network is available for sampling and the sampled network is useful for all subsequent tasks (such as network classification). Yet real-world networks are rarely immediately available unless the sampling process progressively crawls every single node and its connections. Meanwhile, without knowing the underlying analytic objective, the sampled network can hardly produce quality results. In this paper, we propose an Active Exploration framework for large graphs where the goal is to carry out network sampling and node labeling at the same time. To achieve this goal, we consider a network as a Markov chain and compute its stationary distribution by using supervised random walks. The stationary distribution of the sampled network help identify important nodes to be explored in the next step, and the labeling process labels the most informative node which in turn strengthens the sampling process. The mutually and simultaneously enhanced sampling and labeling processes ensure that the final network contains a maximum number of nodes directly related to the underlying mining tasks.
Please use this identifier to cite or link to this item: