Video trajectory analysis

Publication Type:
Issue Date:
Full metadata record
Considering the critical role of trajectory data mining in modern intelligent systems for surveillance security, abnormal behavior detection, crowd behavior analysis, and traffic control. Furthermore, with the widely spreading of camera, trajectories are recorded by camera, so trajectory analysis including trajectory clustering in computer vision is of great use for a lot of works. However, video trajectories analysis is also a hard work, because its limited information to generate trajectories and few representation methods are available. Thus, the better performance could be reached if more reliable motion information is employed. A lot of characterizations are contained in trajectory data that can be useful and powerful in trajectory clustering including distance, speed, direction, relative displacement and some other features. Finally, in the case that a large number of trajectory data need to be cluster into small number of categorizes which are hidden \groups", an unsupervised clustering model is also required to implement the goal. In addition, with more and more lecture videos are available on the Internet, on-line learning and e-learning are getting increasing concerns because of many advantages such as high degree of interactivity. The semantic content discovery for lecture video is very important. However, every lecture video contains a lot of semantic information including spoken language and lecture notes, so how to use all these features is a key problem to improve the performance of e-learning. Therefore, a novel method is proposed in this paper. Reference points are detected and the scale-invariant feature transform (SIFT) descriptor is used to represent the image patches around the points. In addition, SIFT is a descriptor that is fast and robust to match. In order to unify the lengths of trajectories, Discrete Fourier Transformation (DFT) transforms trajectories into frequency domain with a fixed length, so that pattern information is retained. Furthermore, one more feature type is involved to describe object motion that presents the motion of object relative to the camera, and the difference between the static objects and moving objects can be figured out. Latent Dirichlet Allocation (LDA) has great performance on natural language processing, but it prefers to model discrete words only. However, another different kind of semantic feature, continues feature, involves in, so we derive a novel clustering model called derived LDA model which the word-topic distribution following Multivariate distribution. After derived LDA, we derive dual-variable LDA model that processes two different features parallel. Furthermore, a detailed derivative process is given to support our model. In the experiment, we applied our model into two data sets including lecture video and KITTI data set. In lecture video data set, the speaking content and the notes on presentation slides are extracted from the lecture videos, and dual-variable LDA model involves to cluster the videos. For KITTI data set, derived LDA model is applied to consider continue feature only, and dual-variable LDA model is employed to process two kinds of features. The experimental results show that the proposed method can effectively discover the meaningful semantic characters of the lecture videos.
Please use this identifier to cite or link to this item: