Action recognition and video summarisation by submodular inference
- Publication Type:
- Issue Date:
In the field of computer vision, action recognition and video summarisation are two important tasks that are useful for applications such as video indexing and retrieval, human-computer interaction, video surveillance and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this thesis we move from the assumption that action recognition can usefully drive the selection of frames for the summary and that recognising actions from a summary can prove more accurate than from the whole video, and therefore the two tasks should be tackled simultaneously as a joint objective. To this aim, we propose a novel framework based on structured max-margin algorithms and an efficient model for inferring the action and the summary based on the property of submodularity. Recently, submodularity has emerged as an area of interest in machine learning and theoretical computer science, particularly within the domains of optimisation and game theory and is therefore one of the main frameworks for this thesis. To ensure proper exploitation of the proposed method, we have conducted experiments in three different kinds of scenarios: unsupervised summaries, semi-supervised summaries and fully supervised. We also propose a novel loss function - V-JAUNE - to evaluate the quality of a predicted video summary against the summaries annotated by multiple annotators. In a last experiment, we leverage the proposed loss function not only for evaluation, but also for the training stage. The effectiveness of the proposed algorithms is proved using qualitative and quantitative tests on two challenging depth action datasets: ACE and MSR DailyActivity. The results show that the proposed approaches are capable of learning accurate action classifiers and produce informative summaries.
Please use this identifier to cite or link to this item: