Toward extracting and exploiting generalizable knowledge of deep 2D transformations in computer vision

Kang, J; Jia, W; He, X

Toward extracting and exploiting generalizable knowledge of deep 2D transformations in computer vision

Kang, J

Jia, W

He, X

Permalink

Publisher:: ELSEVIER
Publication Type:: Journal Article
Citation:: Neurocomputing, 2023, 562
Issue Date:: 2023-12-28

Embargoed

	Filename	Description	Size
	Toward extracting and exploiting generalizable knowledge of deep 2D transformations in computer vision.pdf	Submitted version	1.8 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 1 Dec 2025

Full metadata record

Field	Value	Language
dc.contributor.author	Kang, J https://orcid.org/0000-0003-2522-2462
dc.contributor.author	Jia, W https://orcid.org/0000-0002-0940-3338
dc.contributor.author	He, X
dc.date.accessioned	2024-05-03T04:17:18Z
dc.date.available	2024-05-03T04:17:18Z
dc.date.issued	2023-12-28
dc.identifier.citation	Neurocomputing, 2023, 562
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/178607
dc.description.abstract	Existing deep learning models suffer from out-of-distribution (o.o.d.) performance drop in computer vision tasks. In comparison, humans have a remarkable ability to interpret images, even if the scenes in the images are rare, thanks to the generalizability of acquired knowledge. This work attempts to answer two research questions: (1) the acquisition and (2) the utilization of generalizable knowledge about 2D transformations. To answer the first question, we demonstrate that deep neural networks can learn generalizable knowledge with a new training methodology based on synthetic datasets. The generalizability is reflected in the results that, even when the knowledge is learned from random noise, the networks can still achieve stable performance in parameter estimation tasks. To answer the second question, a novel architecture called “InterpretNet” is devised to utilize the learned knowledge in image classification tasks. The architecture consists of an ESTIMATOR and an IDENTIFIER, in addition to a CLASSIFIER. By emulating the “hypothesis-verification” process in human visual perception, our InterpretNet improves classification accuracy by 21.1%.
dc.language	English
dc.publisher	ELSEVIER
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2023.126882
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	40 Engineering
dc.subject.classification	46 Information and computing sciences
dc.subject.classification	52 Psychology
dc.title	Toward extracting and exploiting generalizable knowledge of deep 2D transformations in computer vision
dc.type	Journal Article
utslib.citation.volume	562
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2025-12-01T00:00:00+1000Z
dc.date.updated	2024-05-03T04:17:16Z
pubs.publication-status	Published
pubs.volume	562

Abstract:

Existing deep learning models suffer from out-of-distribution (o.o.d.) performance drop in computer vision tasks. In comparison, humans have a remarkable ability to interpret images, even if the scenes in the images are rare, thanks to the generalizability of acquired knowledge. This work attempts to answer two research questions: (1) the acquisition and (2) the utilization of generalizable knowledge about 2D transformations. To answer the first question, we demonstrate that deep neural networks can learn generalizable knowledge with a new training methodology based on synthetic datasets. The generalizability is reflected in the results that, even when the knowledge is learned from random noise, the networks can still achieve stable performance in parameter estimation tasks. To answer the second question, a novel architecture called “InterpretNet” is devised to utilize the learned knowledge in image classification tasks. The architecture consists of an ESTIMATOR and an IDENTIFIER, in addition to a CLASSIFIER. By emulating the “hypothesis-verification” process in human visual perception, our InterpretNet improves classification accuracy by 21.1%.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/178607