Understanding Atomic Hand-Object Interaction With Human Intention

Publisher:
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:
Journal Article
Citation:
IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32, (1), pp. 275-285
Issue Date:
2022-01-01
Filename Description Size
Understanding_Atomic_Hand-Object_Interaction_With_Human_Intention.pdfPublished version2.32 MB
Adobe PDF
Full metadata record
Hand-object interaction plays a very important role when humans manipulate objects. While existing methods focus on improving hand-object recognition with fully automatic methods, human intention has been largely neglected in the recognition process, thus leading to undesirable interaction descriptions. To better interpret human-object interaction that is aligned to human intention, we argue that a reference specifying human intention should be taken into account. Thus, we propose a new approach to represent interactions while reflecting human purpose with three key factors, i.e., hand, object and reference. Specifically, we design a pattern of < hand-object, object-reference, hand, object, reference > (HOR) to recognize intention based atomic hand-object interactions. This pattern aims to model interactions with the states of hand, object, reference and their relationships. Furthermore, we design a simple yet effective Spatially Part-based (3+1)D convolutional neural network, namely SP(3+1)D, which leverages 3D and 1D convolutions to model visual dynamics and object position changes based on our HOR, respectively. With the help of our SP(3+1)D network, the recognition results are able to indicate human purposes accurately. To evaluate the proposed method, we annotate a Something-1.3k dataset, which contains 10 atomic hand-object interactions and about 130 videos for each interaction. Experimental results on Something-1.3k demonstrate the effectiveness of our SP(3+1)D network.
Please use this identifier to cite or link to this item: