This thesis explores probabilistic techniques to model interactions between humans and robotic devices. The work is motivated by the rapid increase in the ageing population and the role that assistive robotic devices can play in maintaining independence and quality of life as assistants and/or companions for these communities. While there are substantial social and ethical implications in this pursuit, it is advocated that robotic systems are bound to acquire more sophisticated assistive capabilities if they are to operate in unstructured, dynamic, human-centred environments, responsive to the needs of their human operators. Such cognitive assistive systems postulate advances along the complete processing pipeline, from sensing, to anticipating user actions and environmental changes, and to delivering natural supportive actuation. Within the boundaries of the human-robot interaction context, it can be expected that acute awareness of human intentions plays a key role in delivering practical assistive actions. This work is thereby focused on the human behaviours likely to result from merging sensed human-robot interactions and the learning gained from past experiences, proposing a framework that facilitates the path towards integrating tightly knit human-robot interaction models.
Human behaviour is complex in nature and interactions with the environment and other objects occur in different and unpredictable ways. Moreover, observed sensory data is often incomplete and noisy. Inferring human intention is thus a challenging problem. This work defends the thesis that in many real-world scenarios these complex behaviours can be naturally simplified by decomposing them into smaller activities, so that their temporal dependencies can be learned more efficiently with the aid of probabilistic hierarchical models. To that end, a strategy is devised in the first part of the thesis to efficiently represent human Activities of Daily Living, or ADLs, by decomposing them into a flexible semantic structure of “Action Primitives” (APs), atomic actions which are proven able to encapsulate complex activities when combined within a temporal probabilistic framework at multiple levels of abstraction. A Hierarchical Hidden Markov Model (HHMM) is proposed as a powerful tool capable of modelling and learning these complex and uncertain human behaviours using knowledge gained from past interactions.
The ADLs performed by humans consist of a variety of complex locomotion-related tasks, as well as activities that involve grasping and manipulation of objects used in everyday life. Two widely used devices that provide assistance to users with mobility impairments while carrying out their ADLs, a power walker and a robotic wheelchair, are instrumented and used to model patterns of navigational activities (i.e. visiting location of interest), as well as some additional platform-specific support activities (e.g. standing up using the support of assistive walker). Human indications while performing these activities are captured using low-level sensing fitted on the mobility devices (e.g. strain gauges, laser range finders). Grasping and manipulations related ADLs are modelled using data captured from a stream of video images, where data comprises of hand-object interactions and their motion in 3D space.
The inference accuracy of the proposed framework in predicting APs and recognising long term user intentions is compared with traditional discriminative models (sequential Support Vector Machines (SVM)), other generative models (layered Dynamic Bayesian Networks (DBN)), and combinations thereof, to provide a complete picture that highlights the benefits of the proposed approach. Results from real data collected from a set of trials conducted by actor users demonstrate that all techniques are able to predict APs with good accuracies, yet successful inference of long term tasks is substantially reduced in the case of the layered DBN and SVM models. These findings validate the thesis’ proposal that the combination of decomposing tasks at multiple levels and exploiting their inherent temporal nature plays a critical role in predicting complex interactive tasks.