Selecting representative objects considering coverage and diversity

Publication Type:
Conference Proceeding
Citation:
GeoRich 2015 - 2nd International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, in conjunction with SIGMOD 2015, 2015, pp. 31 - 36
Issue Date:
2015-05-31
Filename Description Size
GeoRich_2015.pdfPublished version199.9 kB
Adobe PDF
Full metadata record
© 2015 ACM. We say that an object o attracts a user u if o is one of the top-k objects according to the preference function defined by u. Given a set of objects (e.g., restaurants) and a set of users, in this paper, we study the problem of computing a set of representative objects considering two criteria: coverage and diversity. Coverage of a set S of objects is the distinct number of users that are attracted by the objects in S. Although a set of objects with high coverage attracts a large number of users, it is possible that all of these users have quite similar preferences. Consequently, the set of objects may be attractive only for a specific class of users with similar preference functions which may disappoint other users having widely different preferences. The diversity criterion addresses this issue by selecting a set S of objects such that the set of attracted users for each object in S is as different as possible from the sets of users attracted by the other objects in S. The existing work on representative objects considers only one of the coverage and diversity criteria. We are the first to consider both of the criteria where the importance of each criterion can be controlled using a parameter. Our algorithm has two phases. In the first phase, we prune the objects that cannot be among the representative objects and compute the set of attracted users (also called reverse top-k) for each of the remaining objects. In the second phase, the reverse top-k of these objects are used to compute the representative objects maximizing coverage and diversity. Since this problem is NP-hard, the second phase employs a greedy algorithm. For the sake of time and space efficiency, we adopt MinHash and KMV Synopses to assist the set operations. We prove that the proposed greedy algorithm is Q-approximate. Our extensive experimental study on real and synthetic data sets demonstrates the effectiveness of our proposed techniques.
Please use this identifier to cite or link to this item: