Improving Users Engagement Detection using End-to-End Spatio-Temporal Convolutional Neural Networks

Saleh, K; Yu, K; Chen, F

Improving Users Engagement Detection using End-to-End Spatio-Temporal Convolutional Neural Networks

Saleh, K Yu, K Chen, F

Permalink

Publisher:: ACM
Publication Type:: Conference Proceeding
Citation:: ACM/IEEE International Conference on Human-Robot Interaction, 2021, pp. 190-194
Issue Date:: 2021-03-08

Closed Access

	Filename	Description	Size
	hrilb1058-salehA.pdf	Accepted version	1.29 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Saleh, K
dc.contributor.author	Yu, K
dc.contributor.author	Chen, F https://orcid.org/0000-0003-4971-8729
dc.date	2021-03-09
dc.date.accessioned	2022-04-14T04:33:22Z
dc.date.available	2022-04-14T04:33:22Z
dc.date.issued	2021-03-08
dc.identifier.citation	ACM/IEEE International Conference on Human-Robot Interaction, 2021, pp. 190-194
dc.identifier.isbn	9781450382908
dc.identifier.issn	2167-2148
dc.identifier.uri	http://hdl.handle.net/10453/156255
dc.description.abstract	The ability to infer latent behaviours such as the degree of engagement of humans interacting with social robots is still considered one challenging task in the human-robot interaction (HRI) field. Data-driven techniques based on machine learning were recently shown to be a promising approach for tackling the users' engagement detection problem, however, the resolution often involves multiple consecutive stages. This in return makes these techniques either incapable of capturing the users' engagement especially in a dynamic environment or un-deployable because of their inability to track engagement in real-time. This study is based on a data-driven framework, and we propose an end-to-end technique based on a unique 3D convolutional neural network architecture. Our proposed framework was trained and evaluated using a real-life dataset of users interacting spontaneously with a social robot in a dynamic environment. The framework has shown promising results over three different evaluation metrics when compared against three baseline approaches from the literature with an F1-score of 76.72. Additionally, our framework has achieved a resilient real-time performance of 25 Hz.
dc.language	en
dc.publisher	ACM
dc.relation.ispartof	ACM/IEEE International Conference on Human-Robot Interaction
dc.relation.ispartof	Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction
dc.relation.isbasedon	10.1145/3434074.3447157
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Improving Users Engagement Detection using End-to-End Spatio-Temporal Convolutional Neural Networks
dc.type	Conference Proceeding
utslib.location.activity	Virtual Conference
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/A/DRsch The Data Science Institute
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2022-04-14T04:33:20Z
pubs.finish-date	2021-03-11
pubs.publication-status	Published online
pubs.start-date	2021-03-09

Abstract:

The ability to infer latent behaviours such as the degree of engagement of humans interacting with social robots is still considered one challenging task in the human-robot interaction (HRI) field. Data-driven techniques based on machine learning were recently shown to be a promising approach for tackling the users' engagement detection problem, however, the resolution often involves multiple consecutive stages. This in return makes these techniques either incapable of capturing the users' engagement especially in a dynamic environment or un-deployable because of their inability to track engagement in real-time. This study is based on a data-driven framework, and we propose an end-to-end technique based on a unique 3D convolutional neural network architecture. Our proposed framework was trained and evaluated using a real-life dataset of users interacting spontaneously with a social robot in a dynamic environment. The framework has shown promising results over three different evaluation metrics when compared against three baseline approaches from the literature with an F1-score of 76.72. Additionally, our framework has achieved a resilient real-time performance of 25 Hz.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/156255