TY  - JOUR
AB  - Learning from Demonstration (LfD) enables robots to acquire manipulation skills by observing human actions. However, existing methods often face challenges such as high computational cost, limited generalizability, and a loss of key interaction details. This study presents a compact representation for interaction recognition in LfD that encodes human?object interactions using 2D wrist trajectories and 3D object poses. A lightweight extraction pipeline combines MediaPipe-based wrist tracking with FoundationPose-based 6-DoF object estimation to obtain these trajectories directly from RGB-D video without specialized sensors or heavy preprocessing. Experiments on the GRAB and FPHA datasets show that the representation effectively captures task-relevant interactions, achieving 94.6% accuracy on GRAB and 96.0% on FPHA with well-calibrated probability predictions. Both Bidirectional Long Short-Term Memory (Bi-LSTM) with attention and Transformer architectures deliver consistent performance, confirming robustness and generalizability. The method achieves sub-second inference, a memory footprint under 1 GB, and reliable operation on both GPU and CPU platforms, enabling deployment on edge devices such as NVIDIA Jetson. By bridging pose-based and object-centric paradigms, this approach offers a compact and efficient foundation for scalable robot learning while preserving essential spatiotemporal dynamics.
AU  - Pyaraka, JC
AU  - Isaksson, M
AU  - McCormick, J
AU  - Sutjipto, S
AU  - Sukkar, F
DA  - 2025/11/01
DO  - 10.3390/electronics14214297
JO  - Electronics Switzerland
PB  - MDPI
PY  - 2025/11/01
TI  - Generalizable Interaction Recognition for Learning from Demonstration Using Wrist and Object Trajectories
VL  - 14
Y1  - 2025/11/01
Y2  - 2026/05/20
ER  -