Early Expression Detection via Online Multi-Instance Learning With Nonlinear Extension

Publication Type:
Journal Article
IEEE Transactions on Neural Networks and Learning Systems, 2019, 30 (5), pp. 1486 - 1496
Issue Date:
Filename Description Size
08480868.pdfPublished Version2.01 MB
Adobe PDF
Full metadata record
© 2012 IEEE. Video-based facial expression recognition has received substantial attention over the past decade, while early expression detection (EED) is still a relatively new and challenging problem. The goal of EED is to identify an expression as quickly as possible after the expression starts and before it ends. This timely ability has many potential applications, ranging from human-computer interaction to security. The max-margin early event detector (MMED) is a well-known ranking model for early event detection. It can achieve competitive EED performance but suffers from several critical limitations: 1) MMED lacks flexibility in extracting useful information for segment comparison, which leads to poor performance in exploring the ranking relation between segment pairs; 2) the training process is slow due to the large number of constraints, and the memory requirement is also usually hard to satisfy; and 3) MMED is linear in nature, and hence may not be appropriate for data in a nonlinear feature space. To overcome these limitations, we propose an online multi-instance learning (MIL) framework for EED. In particular, the MIL technique is first introduced to generalize MMED, resulting in the proposed MIL-based EED (MIED), which is more general and flexible than MMED, since various instance construction and combination strategies can be adopted. To accelerate the training process, we reformulate MIED in the online setting and develop online multi-instance learning framework for EED (OMIED). To further exploit the nonlinear structure of the data distribution, we incorporate the kernel methods in OMIED, which results in the proposed online kernel multi-instance learning for early expression detection. Experiments on two popular and one challenging video-based expression data sets demonstrate both the efficiency and effectiveness of the proposed methods.
Please use this identifier to cite or link to this item: