Cinematographic shot classification frameworks for movie indexing and retrieval

Publication Type:
Thesis
Issue Date:
2014
Full metadata record
Cinematographic shot classification is an important and challenging task due to its creation mechanisms. A variety of shot types are used in movies in order to attract audience attention and enhance their viewing experiences. In order to index the cinematographic shots in video databases, shot classification is considered as a primary task. In order to classify cinematographic shots, we propose three frameworks in this thesis. Firstly, we propose a context saliency based framework. In the proposed framework, we introduce context saliency based feature extraction technique from a keyframe of a cinematographic video shot. The extracted features from a training dataset are used to train a Support Vector Machine (SVM) to classify the cinematographic shots into pre-defined shot classes. In the second framework, we propose another keyframe based shot classification technique. In this technique, in addition to context saliency map features, a set of cinematographic domain feature extraction mechanisms are proposed for cinematographic shots classification. The proposed approach works in a hierarchical manner. There are two steps involve in the proposed method. Firstly, shots are classified based on depth information extracted from keyframes. Secondly, shots are further classified by using orientations of objects on keyframes. For classification we use SVM. In the third framework, we propose a non-parametric camera motion descriptor called CAMHID for video shot classification. In the proposed method, a motion vector field (MVF) is constructed through the extraction of motion vectors using block matching on a sequence of consecutive video frames. Then, each frame is divided into a number of local regions of equal size. Next, the inconsistent/noisy motion vectors in each local region are eliminated through a motion consistency analysis. The remaining motion vectors of each local region in the sequence of consecutive frames are further collected for a compact representation. A matrix is formed using the motion vectors. The matrix is then decomposed using the singular value decomposition (SVD) technique to identify the dominant motion. The angle of the most dominant principal component is then computed and quantised to represent the motion of the local region using a histogram. In order to represent the global camera motion, the local histograms are combined. The effectiveness of the proposed motion descriptor for video shot classification is tested by using SVM. The proposed camera motion descriptor for video shots classification is evaluated on two video datasets consisting of regular camera motion patterns (e.g., pan, zoom, tilt, static). As an application of CAMHID, we extend the camera motion descriptor by adding a set of features for classification of cinematographic shots. The experimental results show that the proposed shot level camera motion descriptor has a strong discriminative capability to classify different camera motion patterns of different videos effectively. We also show that our approach outperforms state-of-the-art methods. Additionally, we further apply CAMHID features in video copy detection task as another application.
Please use this identifier to cite or link to this item: