Slow feature analysis for human action recognition

Publication Type:
Journal Article
Citation:
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34 (3), pp. 436 - 450
Issue Date:
2012-01-30
Metrics:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2011005510OK.pdf5.49 MB
Adobe PDF
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal [1]. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD-SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition. Experimental results suggest that the SFA-based approach 1) is able to extract useful motion patterns and improves the recognition performance, 2) requires less intermediate processing steps but achieves comparable or even better performance, and 3) has good potential to recognize complex multiperson activities. © 2012 IEEE.
Please use this identifier to cite or link to this item: