Towards Robust Ranker for Text Retrieval

Zhou, Y; Shen, T; Geng, X; Tao, C; Xu, C; Long, G; Jiao, B; Jiang, D

Towards Robust Ranker for Text Retrieval

Zhou, Y Shen, T Geng, X Tao, C Xu, C Long, G

Jiao, B Jiang, D

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2023, pp. 5387-5401
Issue Date:: 2023-01-01

Closed Access

	Filename	Description	Size
	2023.findings-acl.332_published.pdf	Published version	564.33 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhou, Y
dc.contributor.author	Shen, T
dc.contributor.author	Geng, X
dc.contributor.author	Tao, C
dc.contributor.author	Xu, C
dc.contributor.author	Long, G https://orcid.org/0000-0003-3740-9515
dc.contributor.author	Jiao, B
dc.contributor.author	Jiang, D
dc.date.accessioned	2024-03-12T01:38:20Z
dc.date.available	2024-03-12T01:38:20Z
dc.date.issued	2023-01-01
dc.identifier.citation	Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2023, pp. 5387-5401
dc.identifier.isbn	9781959429623
dc.identifier.issn	0736-587X
dc.identifier.uri	http://hdl.handle.net/10453/176534
dc.description.abstract	A neural ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind due to the weak negative mining during contrastive learning. Compared to retrievers boosted by self-adversarial (i.e., in-distribution) negative mining, the ranker's heavy structure suffers from query-document combinatorial explosions, so it can only resort to the negative sampled by the fast yet out-of-distribution retriever. Thereby, the moderate negatives compose ineffective contrastive learning samples, becoming the main barrier to learning a robust ranker. To alleviate this, we propose a multi-adversarial training strategy that leverages multiple retrievers as generators to challenge a ranker, where i) diverse hard negatives from a joint distribution are prone to fool the ranker for more effective adversarial learning and ii) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, leading to more challenging and robust contrastive learning. To evaluate our robust ranker (dubbed R2ANKER), we conduct experiments in various settings on the passage retrieval benchmarks, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.
dc.language	en
dc.relation.ispartof	Proceedings of the Annual Meeting of the Association for Computational Linguistics
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Towards Robust Ranker for Text Retrieval
dc.type	Conference Proceeding
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	closed_access	*
dc.date.updated	2024-03-12T01:38:19Z
pubs.publication-status	Published

Abstract:

A neural ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind due to the weak negative mining during contrastive learning. Compared to retrievers boosted by self-adversarial (i.e., in-distribution) negative mining, the ranker's heavy structure suffers from query-document combinatorial explosions, so it can only resort to the negative sampled by the fast yet out-of-distribution retriever. Thereby, the moderate negatives compose ineffective contrastive learning samples, becoming the main barrier to learning a robust ranker. To alleviate this, we propose a multi-adversarial training strategy that leverages multiple retrievers as generators to challenge a ranker, where i) diverse hard negatives from a joint distribution are prone to fool the ranker for more effective adversarial learning and ii) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, leading to more challenging and robust contrastive learning. To evaluate our robust ranker (dubbed R2ANKER), we conduct experiments in various settings on the passage retrieval benchmarks, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/176534