Improving Users Engagement Detection using End-to-End Spatio-Temporal Convolutional Neural Networks

Publication Type:
Conference Proceeding
ACM/IEEE International Conference on Human-Robot Interaction, 2021, pp. 190-194
Issue Date:
Filename Description Size
hrilb1058-salehA.pdfAccepted version1.29 MB
Adobe PDF
Full metadata record
The ability to infer latent behaviours such as the degree of engagement of humans interacting with social robots is still considered one challenging task in the human-robot interaction (HRI) field. Data-driven techniques based on machine learning were recently shown to be a promising approach for tackling the users' engagement detection problem, however, the resolution often involves multiple consecutive stages. This in return makes these techniques either incapable of capturing the users' engagement especially in a dynamic environment or un-deployable because of their inability to track engagement in real-time. This study is based on a data-driven framework, and we propose an end-to-end technique based on a unique 3D convolutional neural network architecture. Our proposed framework was trained and evaluated using a real-life dataset of users interacting spontaneously with a social robot in a dynamic environment. The framework has shown promising results over three different evaluation metrics when compared against three baseline approaches from the literature with an F1-score of 76.72. Additionally, our framework has achieved a resilient real-time performance of 25 Hz.
Please use this identifier to cite or link to this item: