Multi-instance learning (MIL) is a special learning task where labels are only available for a bag of instances. Although MIL has been used for many applications, existing MIL algorithms cannot handle complex data objects, and all require that instances inside each bag are represented as feature vectors (e.g. being represented in an instance-feature format). In reality, many real-world objects are inherently complicated, and an object can be represented as multiple instances with dependency structures (i.e. graphs). Such dependency allows relationships between objects to play important roles, which, unfortunately, remain unaddressed in traditional instance-feature representations. Motivated by the challenges, this thesis formulates a new multi-graph learning paradigm for representing and classifying complicated objects. With the proposed multi-graph representation, the thesis systematically addresses several key learning tasks, including
Multi-Graph Learning: A graph bag contains one or multiple graphs, and each bag is labeled as either positive or negative. The aim of multi-graph learning is to build a learning model from a number of labeled training bags to predict previously unseen bags with maximum accuracy. To solve the problem, we propose two types of approaches: 1) Multi-Graph Feature based Learning (gMGFL) algorithm that explores and selects an optimal set of subgraphs as features to transfer each bag into a single instance for further learning; and 2) Boosting based Multi-Graph Classification framework (bMGC), which employs dynamic weight adjustment, at both graph- and bag-levels, to select one subgraph in each iteration to form a set of weak graph classifiers.
Multi-Instance Multi-Graph learning: A bag contains a number of instances and graphs in pairs, and the learning objective is to derive classification models from labeled bags, containing both instances and graphs, to predict previously unseen bags with maximum accuracy. In the thesis, we propose a Dual Embedding Multi-Instance Multi-Graph Learning (DE-MIMG) algorithm, which employs a dual embedding learning approach to (1) embed instance distributions into the informative subgraphs discovery process, and (2) embed discovered subgraphs into the instance feature selection process.
Positive and Unlabeled Multi-Graph Learning: The training set only contains positive and unlabeled bags, where labels are only available for bags but not for individual graphs inside the bag. This problem setting raises significant challenges because bag-of-graph setting does not have features available to directly represent graph data, and no negative bags exits for deriving discriminative classification models. To solve the challenge, we propose a puMGL learning framework which relies on two iteratively combined processes: (1) deriving features to represent graphs for learning; and (2) deriving discriminative models with only positive and unlabeled graph bags.
Multi-Graph-View Learning: A multi-graph-view model utilizes graphs constructed from multiple graph-views to represent an object. In our research, we formulate a new multi-graph-view learning task for graph classification, where each object to be classified is represented graphs under multi-graph-view. To solve the problem, we propose a Cross Graph-View Subgraph Feature based Learning (gCGVFL) algorithm that explores an optimal set of subgraph features cross multiple graph-views. In addition, a bag based multi-graph model is further used to relax the labeling by only requiring one label for each graph bag, which corresponds to one object. For learning classification models, we propose a multi-graph-view bag learning algorithm (MGVBL), to explore subgraphs from multiple graph-views for learning.
Experiments on real-world data validate and demonstrate the performance of proposed methods for classifying complicated objects using multi-graph learning.