Recognising and describing human activities in a still image

Zhou, Zheng

Recognising and describing human activities in a still image

Zhou, Zheng

Permalink

Publication Type:: Thesis
Issue Date:: 2017

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (296.41 kB)

Adobe PDF

Download thesisAdobe PDF (10.77 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhou, Zheng
dc.date.accessioned	2017-11-17T00:58:12Z
dc.date.available	2017-11-17T00:58:12Z
dc.date.issued	2017
dc.identifier.uri	http://hdl.handle.net/10453/120262
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	Understanding human activities in a still image is an essential research branch of artificial intelligence. For a computer system, the ability of understanding activities in an image is composed of not only the ability of recognising the activities in the image but also the ability of describing the recognised activities. In the age of big data, activity recognition and description generation for image have received increasing research attention, since they are of great importance in image-based information retrieval, automated image collection and collation, human-computer interaction and automated security surveillance. This thesis conducts research on recognising and describing activities in a still image and achieves several innovative achievements as follows. (1) A framework for recognising human activities based on analysing the interactions among people is proposed. The interactions among people provide useful context for activity recognition but have not been fully taken advantage of by the existing approaches for both individual and group activity recognition. The framework is constructed based on analysing the mechanism that human brains analyse the interactions, and composed of four key sub-tasks, including Human Detection and Segmentation, Feature Extraction, Interaction Analysis and Activity Recognition. (2) An approach for recognising individual activities based on human-interaction analysis is developed. This approach uses an innovative single-level model, called the Non-hierarchical Interaction Analysis Model (NIAM), to analyse the interactions between individuals. The NIAM does not contain a level representing groups and a group discovery process, in order to avoid the errors occurred in and computation consumed for group discovery. Several innovative algorithms are proposed and compose the body of the recognition approach, including a Fusion Restricted Boltzmann Machine for fusing features of different dimensional scales, a Focal Subspace Measurement for calculating the interdependencies between people and a Global-Local Cue Integration Method for selecting and integrating the cues extracted from different people. (3) An approach for recognising group activities based on human-interaction analysis is developed. This approach uses a new multiple-level generative model, called Mixed Group Activity Model (MGAM). Compared with the popular discriminative multiple-level models, the MGAM performs better in comprehensively analysing the information of multiple levels of activities and modeling the interactions among multiple individuals or groups. To connect the MGAM with the raw features in an image, a Body-Part-Angle (BPA) descriptor is proposed. The BPA descriptor is friendly to a generative model that the generation distribution between the model and the raw features can be easily defined and learned. (4) A description generator for describing the human-object interaction activities in images with natural language is proposed. Compared with the sentences given by the traditional retrieval-based approaches, the sentences given by this generator are closer to what is really happening in an image. The generator is implemented based on a deep understanding framework with a 3D spatial layout analysis and a syntactic-tree-based language model.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/120262/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Human-interaction analysis.	en_AU
dc.subject	Image interpretation.	en_AU
dc.subject.lcsh	Image analysis.	en_AU
dc.title	Recognising and describing human activities in a still image	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

Understanding human activities in a still image is an essential research branch of artificial intelligence. For a computer system, the ability of understanding activities in an image is composed of not only the ability of recognising the activities in the image but also the ability of describing the recognised activities. In the age of big data, activity recognition and description generation for image have received increasing research attention, since they are of great importance in image-based information retrieval, automated image collection and collation, human-computer interaction and automated security surveillance. This thesis conducts research on recognising and describing activities in a still image and achieves several innovative achievements as follows. (1) A framework for recognising human activities based on analysing the interactions among people is proposed. The interactions among people provide useful context for activity recognition but have not been fully taken advantage of by the existing approaches for both individual and group activity recognition. The framework is constructed based on analysing the mechanism that human brains analyse the interactions, and composed of four key sub-tasks, including Human Detection and Segmentation, Feature Extraction, Interaction Analysis and Activity Recognition. (2) An approach for recognising individual activities based on human-interaction analysis is developed. This approach uses an innovative single-level model, called the Non-hierarchical Interaction Analysis Model (NIAM), to analyse the interactions between individuals. The NIAM does not contain a level representing groups and a group discovery process, in order to avoid the errors occurred in and computation consumed for group discovery. Several innovative algorithms are proposed and compose the body of the recognition approach, including a Fusion Restricted Boltzmann Machine for fusing features of different dimensional scales, a Focal Subspace Measurement for calculating the interdependencies between people and a Global-Local Cue Integration Method for selecting and integrating the cues extracted from different people. (3) An approach for recognising group activities based on human-interaction analysis is developed. This approach uses a new multiple-level generative model, called Mixed Group Activity Model (MGAM). Compared with the popular discriminative multiple-level models, the MGAM performs better in comprehensively analysing the information of multiple levels of activities and modeling the interactions among multiple individuals or groups. To connect the MGAM with the raw features in an image, a Body-Part-Angle (BPA) descriptor is proposed. The BPA descriptor is friendly to a generative model that the generation distribution between the model and the raw features can be easily defined and learned. (4) A description generator for describing the human-object interaction activities in images with natural language is proposed. Compared with the sentences given by the traditional retrieval-based approaches, the sentences given by this generator are closer to what is really happening in an image. The generator is implemented based on a deep understanding framework with a 3D spatial layout analysis and a syntactic-tree-based language model.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/120262