Modeling positive and negative feedback for improving document retrieval

Hao, S; Shi, C; Niu, Z; Cao, L

Modeling positive and negative feedback for improving document retrieval

Hao, S Shi, C Niu, Z Cao, L

Permalink

Publication Type:: Journal Article
Citation:: Expert Systems with Applications, 2019, 120 pp. 253 - 261
Issue Date:: 2019-04-15

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted ManuscriptAdobe PDF (215.03 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Hao, S	en_US
dc.contributor.author	Shi, C	en_US
dc.contributor.author	Niu, Z	en_US
dc.contributor.author	Cao, L https://orcid.org/0000-0003-1562-9429	en_US
dc.date.available	2021-04-21T19:00:48Z
dc.date.issued	2019-04-15	en_US
dc.identifier.citation	Expert Systems with Applications, 2019, 120 pp. 253 - 261	en_US
dc.identifier.issn	0957-4174	en_US
dc.identifier.uri	http://hdl.handle.net/10453/131855
dc.description.abstract	© 2018 Elsevier Ltd Pseudo-relevance feedback (PRF) has evident potential for enriching the representation of short queries. Traditional PRF methods treat top-ranked documents as feedback, since they are assumed to be relevant to the query. However, some of these feedback documents may actually distract from the query topic for a range of reasons and accordingly downgrade PRF system performance. Such documents constitute negative examples (negative feedback) but could also be valuable in retrieval. In this paper, a novel framework of query language model construction is proposed in order to improve retrieval performance by integrating both positive and negative feedback. First, an improvement-based method is proposed to automatically identify the types of feedback documents (i.e. positive or negative) according to whether the document enhances the retrieval's effectiveness. Subsequently, based on the learned positive and negative examples, the positive feedback models and the negative feedback models are estimated using an Expectation-Maximization algorithm with the assumptions: the positive term distribution is affected by the context term distribution and the negative term distribution is affected by both the positive term distribution and the context term distribution (such that the positive feedback model upgrades the rankings of relevant documents and the negative feedback model prunes the irrelevant documents from a query). Finally, a content-based representativeness criterion is proposed in order to obtain the representative negative feedback documents. Experiments conducted on the TREC collections demonstrate that our proposed approach results in better retrieval accuracy and robustness than baseline methods.	en_US
dc.relation.ispartof	Expert Systems with Applications	en_US
dc.relation.isbasedon	10.1016/j.eswa.2018.11.035	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Modeling positive and negative feedback for improving document retrieval	en_US
dc.type	Journal Article
utslib.citation.volume	120	en_US
utslib.for	01 Mathematical Sciences	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	120	en_US

Abstract:

© 2018 Elsevier Ltd Pseudo-relevance feedback (PRF) has evident potential for enriching the representation of short queries. Traditional PRF methods treat top-ranked documents as feedback, since they are assumed to be relevant to the query. However, some of these feedback documents may actually distract from the query topic for a range of reasons and accordingly downgrade PRF system performance. Such documents constitute negative examples (negative feedback) but could also be valuable in retrieval. In this paper, a novel framework of query language model construction is proposed in order to improve retrieval performance by integrating both positive and negative feedback. First, an improvement-based method is proposed to automatically identify the types of feedback documents (i.e. positive or negative) according to whether the document enhances the retrieval's effectiveness. Subsequently, based on the learned positive and negative examples, the positive feedback models and the negative feedback models are estimated using an Expectation-Maximization algorithm with the assumptions: the positive term distribution is affected by the context term distribution and the negative term distribution is affected by both the positive term distribution and the context term distribution (such that the positive feedback model upgrades the rankings of relevant documents and the negative feedback model prunes the irrelevant documents from a query). Finally, a content-based representativeness criterion is proposed in order to obtain the representative negative feedback documents. Experiments conducted on the TREC collections demonstrate that our proposed approach results in better retrieval accuracy and robustness than baseline methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/131855