Selecting representative objects considering coverage and diversity

Wang, S; Cheema, MA; Zhang, Y; Lin, X

Selecting representative objects considering coverage and diversity

Wang, S Cheema, MA Zhang, Y

Lin, X

Permalink

Publication Type:: Conference Proceeding
Citation:: GeoRich 2015 - 2nd International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, in conjunction with SIGMOD 2015, 2015, pp. 31 - 36
Issue Date:: 2015-05-31

Closed Access

	Filename	Description	Size
	GeoRich_2015.pdf	Published version	199.9 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wang, S	en_US
dc.contributor.author	Cheema, MA	en_US
dc.contributor.author	Zhang, Y https://orcid.org/0000-0002-2674-1638	en_US
dc.contributor.author	Lin, X	en_US
dc.date.issued	2015-05-31	en_US
dc.identifier.citation	GeoRich 2015 - 2nd International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, in conjunction with SIGMOD 2015, 2015, pp. 31 - 36	en_US
dc.identifier.isbn	9781450336680	en_US
dc.identifier.uri	http://hdl.handle.net/10453/44062
dc.description.abstract	© 2015 ACM. We say that an object o attracts a user u if o is one of the top-k objects according to the preference function defined by u. Given a set of objects (e.g., restaurants) and a set of users, in this paper, we study the problem of computing a set of representative objects considering two criteria: coverage and diversity. Coverage of a set S of objects is the distinct number of users that are attracted by the objects in S. Although a set of objects with high coverage attracts a large number of users, it is possible that all of these users have quite similar preferences. Consequently, the set of objects may be attractive only for a specific class of users with similar preference functions which may disappoint other users having widely different preferences. The diversity criterion addresses this issue by selecting a set S of objects such that the set of attracted users for each object in S is as different as possible from the sets of users attracted by the other objects in S. The existing work on representative objects considers only one of the coverage and diversity criteria. We are the first to consider both of the criteria where the importance of each criterion can be controlled using a parameter. Our algorithm has two phases. In the first phase, we prune the objects that cannot be among the representative objects and compute the set of attracted users (also called reverse top-k) for each of the remaining objects. In the second phase, the reverse top-k of these objects are used to compute the representative objects maximizing coverage and diversity. Since this problem is NP-hard, the second phase employs a greedy algorithm. For the sake of time and space efficiency, we adopt MinHash and KMV Synopses to assist the set operations. We prove that the proposed greedy algorithm is Q-approximate. Our extensive experimental study on real and synthetic data sets demonstrates the effectiveness of our proposed techniques.	en_US
dc.relation.ispartof	GeoRich 2015 - 2nd International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, in conjunction with SIGMOD 2015	en_US
dc.relation.isbasedon	10.1145/2786006.2786012	en_US
dc.title	Selecting representative objects considering coverage and diversity	en_US
dc.type	Conference Proceeding
utslib.for	0806 Information Systems	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2015 ACM. We say that an object o attracts a user u if o is one of the top-k objects according to the preference function defined by u. Given a set of objects (e.g., restaurants) and a set of users, in this paper, we study the problem of computing a set of representative objects considering two criteria: coverage and diversity. Coverage of a set S of objects is the distinct number of users that are attracted by the objects in S. Although a set of objects with high coverage attracts a large number of users, it is possible that all of these users have quite similar preferences. Consequently, the set of objects may be attractive only for a specific class of users with similar preference functions which may disappoint other users having widely different preferences. The diversity criterion addresses this issue by selecting a set S of objects such that the set of attracted users for each object in S is as different as possible from the sets of users attracted by the other objects in S. The existing work on representative objects considers only one of the coverage and diversity criteria. We are the first to consider both of the criteria where the importance of each criterion can be controlled using a parameter. Our algorithm has two phases. In the first phase, we prune the objects that cannot be among the representative objects and compute the set of attracted users (also called reverse top-k) for each of the remaining objects. In the second phase, the reverse top-k of these objects are used to compute the representative objects maximizing coverage and diversity. Since this problem is NP-hard, the second phase employs a greedy algorithm. For the sake of time and space efficiency, we adopt MinHash and KMV Synopses to assist the set operations. We prove that the proposed greedy algorithm is Q-approximate. Our extensive experimental study on real and synthetic data sets demonstrates the effectiveness of our proposed techniques.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/44062