Cinematographic shot classification frameworks for movie indexing and retrieval

Hasan, MA

Cinematographic shot classification frameworks for movie indexing and retrieval

Hasan, MA

Permalink

Publication Type:: Thesis
Issue Date:: 2014

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download contents and abstractAdobe PDF (159.46 kB)

Download thesisAdobe PDF (5.13 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Hasan, MA
dc.date.accessioned	2015-03-24T23:57:13Z
dc.date.available	2015-03-24T23:57:13Z
dc.date.issued	2014
dc.identifier.uri	http://hdl.handle.net/10453/34373
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US
dc.description.abstract	Cinematographic shot classification is an important and challenging task due to its creation mechanisms. A variety of shot types are used in movies in order to attract audience attention and enhance their viewing experiences. In order to index the cinematographic shots in video databases, shot classification is considered as a primary task. In order to classify cinematographic shots, we propose three frameworks in this thesis. Firstly, we propose a context saliency based framework. In the proposed framework, we introduce context saliency based feature extraction technique from a keyframe of a cinematographic video shot. The extracted features from a training dataset are used to train a Support Vector Machine (SVM) to classify the cinematographic shots into pre-defined shot classes. In the second framework, we propose another keyframe based shot classification technique. In this technique, in addition to context saliency map features, a set of cinematographic domain feature extraction mechanisms are proposed for cinematographic shots classification. The proposed approach works in a hierarchical manner. There are two steps involve in the proposed method. Firstly, shots are classified based on depth information extracted from keyframes. Secondly, shots are further classified by using orientations of objects on keyframes. For classification we use SVM. In the third framework, we propose a non-parametric camera motion descriptor called CAMHID for video shot classification. In the proposed method, a motion vector field (MVF) is constructed through the extraction of motion vectors using block matching on a sequence of consecutive video frames. Then, each frame is divided into a number of local regions of equal size. Next, the inconsistent/noisy motion vectors in each local region are eliminated through a motion consistency analysis. The remaining motion vectors of each local region in the sequence of consecutive frames are further collected for a compact representation. A matrix is formed using the motion vectors. The matrix is then decomposed using the singular value decomposition (SVD) technique to identify the dominant motion. The angle of the most dominant principal component is then computed and quantised to represent the motion of the local region using a histogram. In order to represent the global camera motion, the local histograms are combined. The effectiveness of the proposed motion descriptor for video shot classification is tested by using SVM. The proposed camera motion descriptor for video shots classification is evaluated on two video datasets consisting of regular camera motion patterns (e.g., pan, zoom, tilt, static). As an application of CAMHID, we extend the camera motion descriptor by adding a set of features for classification of cinematographic shots. The experimental results show that the proposed shot level camera motion descriptor has a strong discriminative capability to classify different camera motion patterns of different videos effectively. We also show that our approach outperforms state-of-the-art methods. Additionally, we further apply CAMHID features in video copy detection task as another application.	en_US
dc.format	Thesis (PhD)	en_US
dc.language.iso	en	en_US
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/34373/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	au.edu.uts.lib/ppc
dc.title	Cinematographic shot classification frameworks for movie indexing and retrieval	en_US
dc.type	Thesis
utslib.copyright.status	open_access

Abstract:

Cinematographic shot classification is an important and challenging task due to its creation mechanisms. A variety of shot types are used in movies in order to attract audience attention and enhance their viewing experiences. In order to index the cinematographic shots in video databases, shot classification is considered as a primary task. In order to classify cinematographic shots, we propose three frameworks in this thesis. Firstly, we propose a context saliency based framework. In the proposed framework, we introduce context saliency based feature extraction technique from a keyframe of a cinematographic video shot. The extracted features from a training dataset are used to train a Support Vector Machine (SVM) to classify the cinematographic shots into pre-defined shot classes. In the second framework, we propose another keyframe based shot classification technique. In this technique, in addition to context saliency map features, a set of cinematographic domain feature extraction mechanisms are proposed for cinematographic shots classification. The proposed approach works in a hierarchical manner. There are two steps involve in the proposed method. Firstly, shots are classified based on depth information extracted from keyframes. Secondly, shots are further classified by using orientations of objects on keyframes. For classification we use SVM. In the third framework, we propose a non-parametric camera motion descriptor called CAMHID for video shot classification. In the proposed method, a motion vector field (MVF) is constructed through the extraction of motion vectors using block matching on a sequence of consecutive video frames. Then, each frame is divided into a number of local regions of equal size. Next, the inconsistent/noisy motion vectors in each local region are eliminated through a motion consistency analysis. The remaining motion vectors of each local region in the sequence of consecutive frames are further collected for a compact representation. A matrix is formed using the motion vectors. The matrix is then decomposed using the singular value decomposition (SVD) technique to identify the dominant motion. The angle of the most dominant principal component is then computed and quantised to represent the motion of the local region using a histogram. In order to represent the global camera motion, the local histograms are combined. The effectiveness of the proposed motion descriptor for video shot classification is tested by using SVM. The proposed camera motion descriptor for video shots classification is evaluated on two video datasets consisting of regular camera motion patterns (e.g., pan, zoom, tilt, static). As an application of CAMHID, we extend the camera motion descriptor by adding a set of features for classification of cinematographic shots. The experimental results show that the proposed shot level camera motion descriptor has a strong discriminative capability to classify different camera motion patterns of different videos effectively. We also show that our approach outperforms state-of-the-art methods. Additionally, we further apply CAMHID features in video copy detection task as another application.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/34373