Recognising and describing human activities in a still image

Publication Type:
Issue Date:
Full metadata record
Understanding human activities in a still image is an essential research branch of artificial intelligence. For a computer system, the ability of understanding activities in an image is composed of not only the ability of recognising the activities in the image but also the ability of describing the recognised activities. In the age of big data, activity recognition and description generation for image have received increasing research attention, since they are of great importance in image-based information retrieval, automated image collection and collation, human-computer interaction and automated security surveillance. This thesis conducts research on recognising and describing activities in a still image and achieves several innovative achievements as follows. (1) A framework for recognising human activities based on analysing the interactions among people is proposed. The interactions among people provide useful context for activity recognition but have not been fully taken advantage of by the existing approaches for both individual and group activity recognition. The framework is constructed based on analysing the mechanism that human brains analyse the interactions, and composed of four key sub-tasks, including Human Detection and Segmentation, Feature Extraction, Interaction Analysis and Activity Recognition. (2) An approach for recognising individual activities based on human-interaction analysis is developed. This approach uses an innovative single-level model, called the Non-hierarchical Interaction Analysis Model (NIAM), to analyse the interactions between individuals. The NIAM does not contain a level representing groups and a group discovery process, in order to avoid the errors occurred in and computation consumed for group discovery. Several innovative algorithms are proposed and compose the body of the recognition approach, including a Fusion Restricted Boltzmann Machine for fusing features of different dimensional scales, a Focal Subspace Measurement for calculating the interdependencies between people and a Global-Local Cue Integration Method for selecting and integrating the cues extracted from different people. (3) An approach for recognising group activities based on human-interaction analysis is developed. This approach uses a new multiple-level generative model, called Mixed Group Activity Model (MGAM). Compared with the popular discriminative multiple-level models, the MGAM performs better in comprehensively analysing the information of multiple levels of activities and modeling the interactions among multiple individuals or groups. To connect the MGAM with the raw features in an image, a Body-Part-Angle (BPA) descriptor is proposed. The BPA descriptor is friendly to a generative model that the generation distribution between the model and the raw features can be easily defined and learned. (4) A description generator for describing the human-object interaction activities in images with natural language is proposed. Compared with the sentences given by the traditional retrieval-based approaches, the sentences given by this generator are closer to what is really happening in an image. The generator is implemented based on a deep understanding framework with a 3D spatial layout analysis and a syntactic-tree-based language model.
Please use this identifier to cite or link to this item: