Collective Reconstructive Embeddings for Cross-Modal Hashing

Hu, M; Yang, Y; Shen, F; Xie, N; Hong, R; Shen, HT

Collective Reconstructive Embeddings for Cross-Modal Hashing

Hu, M Yang, Y Shen, F Xie, N Hong, R Shen, HT

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Image Processing, 2019, 28 (6), pp. 2770 - 2784
Issue Date:: 2019-06-01

Closed Access

	Filename	Description	Size
	Collective Reconstructive Embeddings for Cross-Modal Hashing.pdf	Published Version	3.66 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Hu, M	en_US
dc.contributor.author	Yang, Y	en_US
dc.contributor.author	Shen, F	en_US
dc.contributor.author	Xie, N	en_US
dc.contributor.author	Hong, R	en_US
dc.contributor.author	Shen, HT https://orcid.org/0000-0002-2999-2088	en_US
dc.date.issued	2019-06-01	en_US
dc.identifier.citation	IEEE Transactions on Image Processing, 2019, 28 (6), pp. 2770 - 2784	en_US
dc.identifier.issn	1057-7149	en_US
dc.identifier.uri	http://hdl.handle.net/10453/134983
dc.description.abstract	© 1992-2012 IEEE. In this paper, we study the problem of cross-modal retrieval by hashing-based approximate nearest neighbor search techniques. Most existing cross-modal hashing works mainly address the issue of multi-modal integration complexity using the same mapping and similarity calculation for data from different media types. Nonetheless, this may cause information loss during the mapping process due to overlooking the specifics of each individual modality. In this paper, we propose a simple yet effective cross-modal hashing approach, termed collective reconstructive embeddings (CRE), which can simultaneously solve the heterogeneity and integration complexity of multi-modal data. To address the heterogeneity challenge, we propose to process heterogeneous types of data using different modality-specific models. Specifically, we model textual data with cosine similarity-based reconstructive embedding to alleviate the data sparsity to the greatest extent, while for image data, we utilize the Euclidean distance to characterize the relationships of the projected hash codes. Meanwhile, we unify the projections of text and image to the Hamming space into a common reconstructive embedding through rigid mathematical reformulation, which not only reduces the optimization complexity significantly but also facilitates the inter-modal similarity preservation among different modalities. We further incorporate the code balance and uncorrelation criteria into the problem and devise an efficient iterative algorithm for optimization. Comprehensive experiments on four widely used multimodal benchmarks show that the proposed CRE can achieve a superior performance compared with the state of the art on several challenging cross-modal tasks.	en_US
dc.relation.ispartof	IEEE Transactions on Image Processing	en_US
dc.relation.isbasedon	10.1109/TIP.2018.2890144	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Collective Reconstructive Embeddings for Cross-Modal Hashing	en_US
dc.type	Journal Article
utslib.citation.volume	6	en_US
utslib.citation.volume	28	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
utslib.for	1702 Cognitive Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Software
utslib.copyright.status	closed_access
pubs.issue	6	en_US
pubs.publication-status	Published	en_US
pubs.volume	28	en_US

Abstract:

© 1992-2012 IEEE. In this paper, we study the problem of cross-modal retrieval by hashing-based approximate nearest neighbor search techniques. Most existing cross-modal hashing works mainly address the issue of multi-modal integration complexity using the same mapping and similarity calculation for data from different media types. Nonetheless, this may cause information loss during the mapping process due to overlooking the specifics of each individual modality. In this paper, we propose a simple yet effective cross-modal hashing approach, termed collective reconstructive embeddings (CRE), which can simultaneously solve the heterogeneity and integration complexity of multi-modal data. To address the heterogeneity challenge, we propose to process heterogeneous types of data using different modality-specific models. Specifically, we model textual data with cosine similarity-based reconstructive embedding to alleviate the data sparsity to the greatest extent, while for image data, we utilize the Euclidean distance to characterize the relationships of the projected hash codes. Meanwhile, we unify the projections of text and image to the Hamming space into a common reconstructive embedding through rigid mathematical reformulation, which not only reduces the optimization complexity significantly but also facilitates the inter-modal similarity preservation among different modalities. We further incorporate the code balance and uncorrelation criteria into the problem and devise an efficient iterative algorithm for optimization. Comprehensive experiments on four widely used multimodal benchmarks show that the proposed CRE can achieve a superior performance compared with the state of the art on several challenging cross-modal tasks.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/134983