Multiclass Learning with Partially Corrupted Labels

Publication Type:
Journal Article
Citation:
IEEE Transactions on Neural Networks and Learning Systems, 2018, 29 (6), pp. 2568 - 2580
Issue Date:
2018-06-01
Filename Description Size
07929355.pdfPublished Version2.2 MB
Adobe PDF
Full metadata record
© 2012 IEEE. Traditional classification systems rely heavily on sufficient training data with accurate labels. However, the quality of the collected data depends on the labelers, among which inexperienced labelers may exist and produce unexpected labels that may degrade the performance of a learning system. In this paper, we investigate the multiclass classification problem where a certain amount of training examples are randomly labeled. Specifically, we show that this issue can be formulated as a label noise problem. To perform multiclass classification, we employ the widely used importance reweighting strategy to enable the learning on noisy data to more closely reflect the results on noise-free data. We illustrate the applicability of this strategy to any surrogate loss functions and to different classification settings. The proportion of randomly labeled examples is proved to be upper bounded and can be estimated under a mild condition. The convergence analysis ensures the consistency of the learned classifier to the optimal classifier with respect to clean data. Two instantiations of the proposed strategy are also introduced. Experiments on synthetic and real data verify that our approach yields improvements over the traditional classifiers as well as the robust classifiers. Moreover, we empirically demonstrate that the proposed strategy is effective even on asymmetrically noisy data.
Please use this identifier to cite or link to this item: