Multi-label classification is an important topic in the field of machine learning. In many real applications, there exist potential dependencies or correlations between labels, and exploiting the underlying knowledge could effectively improve the learning performance. Therefore, how to learn and utilize the dependencies between labels has become one of the key issues of multi-label classification.
This thesis firstly summarizes existing works and analyses their advantages and disadvantages. Several effective methods for multi-label classification are then proposed, focusing on ways of exploiting various types of label dependencies. The contributions of this thesis mainly include:
(1) A method that uses a tree-structured restricted Bayesian network to represent the dependency structure of labels is proposed. This work is inspired by the ClassifierChain method. Compared with ClassifierChain, our method's advantage is that the dependencies between labels are represented using a Bayesian network rather than a randomly selected chain, so more appropriate label dependencies could be determined. Furthermore, ensemble learning technique is used to construct and combine multiple tree-structured Bayesian networks, thus the mutual dependencies between labels could be fully exploited and the final model could be more robust. The experimental results verify the effectiveness of these methods. Compared with other baselines, the show better performance due to more appropriate label dependencies are captured.
(2) A common strategy of exploiting label dependencies is, for every label, to the labels it depends on and use these labels as auxiliary features in the training phase. The issues of this strategy are that the influence of label dependencies could be depressed by existing features and indirect label dependencies could not be taken into consideration. Therefore, a new learning paradigm that separates the influence of existing features and labels is introduced, and the impact of label dependencies could be well intensified in this way. Moreover, a method that models the propagation of label dependencies as a RWR process (Random Walk with Restart) is proposed. In this method, label dependencies are encoded as a graph, and the dynamic and indirect dependencies between labels are utilized through the RWR process over the label graph. The experimental results validate this method, showing that it outperforms other baselines in terms of learning a label ranking.
(3) Based on above method, a method that takes multiple factors into consideration when learning label dependencies is proposed. In this method, dependency between two labels is characterized from different perspectives, and is determined by learning a linear combination of multiple measures. A particular loss function is designed, and thus the optimal label dependencies, i.e., the dependency matrix in RWR process, can be obtained by minimizing the loss function. The advantage of this method include: a) label dependencies are measures and combined from different perspectives, and b) label dependences that are optimal to a particular loss function now are obtained. The experimental results indicate that this method could further learn a better label ranking compared with the previous one, given an explicit loss function.
(4) A novel method that learns label ranking by exploiting preferences between true labels and other labels is proposed. In this method, the original instance space and label space are mapped into a low-dimensional space using matrix factorization technique. Therefore, one advantage of the method is that the number of label is reduced greatly, and problem with massive labels now can be handle efficiently. Moreover, a loss function is formulated based on the assumption that an instance's true labels which have been given explicitly should be ranked before other labels which are not provided explicitly. It is then used to guide the process of matrix factorization and label ranking learning. The advantage of this novel assumption is that it alleviate issue in traditional assumption that if a label is not given explicitly, it should not be a true label. Therefore, this method is also applicable to data that are partially labelled. Its effectiveness is validated by the experimental result which shows that it could rank explicitly given label well before other labels for a given instance.
In summary, this thesis has proposed several effective methods that exploit label dependencies from different perspectives, and their effectiveness have been validated by experiments. These achievements lay a good foundation for further research and applications.