GID-Net: Detecting human-object interaction with global and instance dependency

Yang, D; Zou, YX; Zhang, J; Li, G

GID-Net: Detecting human-object interaction with global and instance dependency

Yang, D Zou, YX Zhang, J

Li, G

Permalink

Publisher:: Elsevier BV
Publication Type:: Journal Article
Citation:: Neurocomputing, 2020
Issue Date:: 2020-01-01

Closed Access

	Filename	Description	Size
	Neurocomputing-D-M-Yang.pdf	Published version	3.8 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yang, D
dc.contributor.author	Zou, YX
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541
dc.contributor.author	Li, G
dc.date.accessioned	2021-01-13T06:21:59Z
dc.date.available	2021-01-13T06:21:59Z
dc.date.issued	2020-01-01
dc.identifier.citation	Neurocomputing, 2020
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/145398
dc.description.abstract	© 2020 Elsevier B.V. Since detecting and recognizing individual human or object are not adequate to understand the visual world, learning how humans interact with surrounding objects becomes a core technology. However, convolution operations are weak in depicting visual interactions between the instances since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human perception in observing HOIs to introduce a two-stage trainable reasoning mechanism, referred to as GID block. GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances. Furthermore, we conduct a multi-stream network called GID-Net, which is a human-object interaction detection framework consisting of a human branch, an object branch and an interaction branch. Semantic information in global-level and local-level are efficiently reasoned and aggregated in each of the branches. We have compared our proposed GID-Net with existing state-of-the-art methods on two public benchmarks, including V-COCO and HICO-DET. The results have showed that GID-Net outperforms the existing best-performing methods on both the above two benchmarks, validating its efficacy in detecting human-object interactions.
dc.language	en
dc.publisher	Elsevier BV
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2020.02.136
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	GID-Net: Detecting human-object interaction with global and instance dependency
dc.type	Journal Article
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
dc.date.updated	2021-01-13T06:21:52Z
pubs.publication-status	Published

Abstract:

© 2020 Elsevier B.V. Since detecting and recognizing individual human or object are not adequate to understand the visual world, learning how humans interact with surrounding objects becomes a core technology. However, convolution operations are weak in depicting visual interactions between the instances since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human perception in observing HOIs to introduce a two-stage trainable reasoning mechanism, referred to as GID block. GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances. Furthermore, we conduct a multi-stream network called GID-Net, which is a human-object interaction detection framework consisting of a human branch, an object branch and an interaction branch. Semantic information in global-level and local-level are efficiently reasoned and aggregated in each of the branches. We have compared our proposed GID-Net with existing state-of-the-art methods on two public benchmarks, including V-COCO and HICO-DET. The results have showed that GID-Net outperforms the existing best-performing methods on both the above two benchmarks, validating its efficacy in detecting human-object interactions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/145398