Human action recognition and localization in video using structured learning of local space-time features

Publication Type:
Conference Proceeding
Proceedings - IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2010, 2010, pp. 204 - 211
Issue Date:
Filename Description Size
Thumbnail2013006868OK.pdf8.97 MB
Adobe PDF
Full metadata record
This paper presents a unified framework for human action classification and localization in video using structured learning of local space-time features. Each human action class is represented by a set of its own compact set of local patches. In our approach, we first use a discriminative hierarchical Bayesian classifier to select those space-time interest points that are constructive for each particular action. Those concise local features are then passed to a Support Vector Machine with Principal Component Analysis projection for the classification task. Meanwhile, the action localization is done using Dynamic Conditional Random Fields developed to incorporate the spatial and temporal structure constraints of superpixels extracted around those features. Each superpixel in the video is defined by the shape and motion information of its corresponding feature region. Compelling results obtained from experiments on KTH [22], Weizmann [1], HOHA [13] and TRECVid [23] datasets have proven the efficiency and robustness of our framework for the task of human action recognition and localization in video. © 2010 IEEE.
Please use this identifier to cite or link to this item: