Action recognition and video summarisation by submodular inference

Hussein, Fairouz

Action recognition and video summarisation by submodular inference

Hussein, Fairouz

Permalink

Publication Type:: Thesis
Issue Date:: 2017

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (361.38 kB)

Adobe PDF

Download thesisAdobe PDF (4.13 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Hussein, Fairouz
dc.date.accessioned	2017-06-05T01:56:40Z
dc.date.available	2017-06-05T01:56:40Z
dc.date.issued	2017
dc.identifier.uri	http://hdl.handle.net/10453/102741
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	In the field of computer vision, action recognition and video summarisation are two important tasks that are useful for applications such as video indexing and retrieval, human-computer interaction, video surveillance and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this thesis we move from the assumption that action recognition can usefully drive the selection of frames for the summary and that recognising actions from a summary can prove more accurate than from the whole video, and therefore the two tasks should be tackled simultaneously as a joint objective. To this aim, we propose a novel framework based on structured max-margin algorithms and an efficient model for inferring the action and the summary based on the property of submodularity. Recently, submodularity has emerged as an area of interest in machine learning and theoretical computer science, particularly within the domains of optimisation and game theory and is therefore one of the main frameworks for this thesis. To ensure proper exploitation of the proposed method, we have conducted experiments in three different kinds of scenarios: unsupervised summaries, semi-supervised summaries and fully supervised. We also propose a novel loss function - V-JAUNE - to evaluate the quality of a predicted video summary against the summaries annotated by multiple annotators. In a last experiment, we leverage the proposed loss function not only for evaluation, but also for the training stage. The effectiveness of the proposed algorithms is proved using qualitative and quantitative tests on two challenging depth action datasets: ACE and MSR DailyActivity. The results show that the proposed approaches are capable of learning accurate action classifiers and produce informative summaries.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/102741/7/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Computer vision.	en
dc.subject	Pattern recognition systems.	en
dc.subject	Video surveillance.	en
dc.subject	Structured max-margin algorithms.	en
dc.subject	Human-computer interaction.	en
dc.subject	Submodularity .	en
dc.title	Action recognition and video summarisation by submodular inference	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

In the field of computer vision, action recognition and video summarisation are two important tasks that are useful for applications such as video indexing and retrieval, human-computer interaction, video surveillance and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this thesis we move from the assumption that action recognition can usefully drive the selection of frames for the summary and that recognising actions from a summary can prove more accurate than from the whole video, and therefore the two tasks should be tackled simultaneously as a joint objective. To this aim, we propose a novel framework based on structured max-margin algorithms and an efficient model for inferring the action and the summary based on the property of submodularity. Recently, submodularity has emerged as an area of interest in machine learning and theoretical computer science, particularly within the domains of optimisation and game theory and is therefore one of the main frameworks for this thesis. To ensure proper exploitation of the proposed method, we have conducted experiments in three different kinds of scenarios: unsupervised summaries, semi-supervised summaries and fully supervised. We also propose a novel loss function - V-JAUNE - to evaluate the quality of a predicted video summary against the summaries annotated by multiple annotators. In a last experiment, we leverage the proposed loss function not only for evaluation, but also for the training stage. The effectiveness of the proposed algorithms is proved using qualitative and quantitative tests on two challenging depth action datasets: ACE and MSR DailyActivity. The results show that the proposed approaches are capable of learning accurate action classifiers and produce informative summaries.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/102741