Compact indexing and judicious searching for billion-scale microblog retrieval

Zhang, D; Nie, L; Luan, H; Tan, KL; Chua, TS; Shen, HT

Compact indexing and judicious searching for billion-scale microblog retrieval

Zhang, D Nie, L Luan, H Tan, KL Chua, TS Shen, HT

Permalink

Publication Type:: Journal Article
Citation:: ACM Transactions on Information Systems, 2017, 35 (3)
Issue Date:: 2017-05-01

Closed Access

	Filename	Description	Size
	a27-zhang.pdf	Published Version	652.32 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, D	en_US
dc.contributor.author	Nie, L	en_US
dc.contributor.author	Luan, H	en_US
dc.contributor.author	Tan, KL	en_US
dc.contributor.author	Chua, TS	en_US
dc.contributor.author	Shen, HT	en_US
dc.date.issued	2017-05-01	en_US
dc.identifier.citation	ACM Transactions on Information Systems, 2017, 35 (3)	en_US
dc.identifier.issn	1046-8188	en_US
dc.identifier.uri	http://hdl.handle.net/10453/123921
dc.description.abstract	© 2017 ACM. In this article, we study the problem of efficient top-k disjunctive query processing in a huge microblog dataset. In terms of compact indexing, we categorize the keywords into rare terms and common terms based on inverse document frequency (idf) and propose tailored block-oriented organization to save memory consumption. In terms of fast searching, we classify the queries into three types based on term category and judiciously design an efficient search algorithm for each type. We conducted extensive experiments on a billion-scale Twitter dataset and examined the performance with both simple and more advanced ranking functions. The results showed that with much smaller index size, our search algorithm achieves a factor of 2-3 times faster speedup over state-of-the-art solutions in both ranking scenarios.	en_US
dc.relation.ispartof	ACM Transactions on Information Systems	en_US
dc.relation.isbasedon	10.1145/3052771	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Compact indexing and judicious searching for billion-scale microblog retrieval	en_US
dc.type	Journal Article
utslib.citation.volume	3	en_US
utslib.citation.volume	35	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0807 Library and Information Studies	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Software
utslib.copyright.status	closed_access
pubs.issue	3	en_US
pubs.publication-status	Published	en_US
pubs.volume	35	en_US

Abstract:

© 2017 ACM. In this article, we study the problem of efficient top-k disjunctive query processing in a huge microblog dataset. In terms of compact indexing, we categorize the keywords into rare terms and common terms based on inverse document frequency (idf) and propose tailored block-oriented organization to save memory consumption. In terms of fast searching, we classify the queries into three types based on term category and judiciously design an efficient search algorithm for each type. We conducted extensive experiments on a billion-scale Twitter dataset and examined the performance with both simple and more advanced ranking functions. The results showed that with much smaller index size, our search algorithm achieves a factor of 2-3 times faster speedup over state-of-the-art solutions in both ranking scenarios.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/123921