A framework of online learning with imbalanced streaming data

Yan, Y; Yang, T; Yang, Y; Chen, J

A framework of online learning with imbalanced streaming data

Yan, Y Yang, T Yang, Y

Chen, J

Permalink

Publication Type:: Conference Proceeding
Citation:: 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 2817 - 2823
Issue Date:: 2017-01-01

Closed Access

	Filename	Description	Size
	14487-66969-1-PB.pdf	Published version	791.94 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yan, Y	en_US
dc.contributor.author	Yang, T	en_US
dc.contributor.author	Yang, Y https://orcid.org/0000-0001-5528-0546	en_US
dc.contributor.author	Chen, J	en_US
dc.date.issued	2017-01-01	en_US
dc.identifier.citation	31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 2817 - 2823	en_US
dc.identifier.uri	http://hdl.handle.net/10453/125896
dc.description.abstract	Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. A challenge for mining large-scale streaming data overlooked by most existing studies on online learning is the skew-distribution of examples over different classes. Many previous works have considered cost-sensitive approaches in an online setting for streaming data, where fixed costs are assigned to different classes, or ad-hoc costs are adapted based on the distribution of data received so far. However, it is not necessary for them to achieve optimal performance in terms of the measures suited for imbalanced data, such as Fmeasure, area under ROC curve (AUROC), area under precision and recall curve (AUPRC). This work proposes a general framework for online learning with imbalanced streaming data, where examples are coming sequentially and models are updated accordingly on-the-fly. By simultaneously learning multiple classifiers with different cost vectors, the proposed method can be adopted for different target measures for imbalanced data, including F-measure, AUROC and AUPRC. Moreover, we present a rigorous theoretical justification of the proposed framework for the F-measure maximization. Our empirical studies demonstrate the competitive if not better performance of the proposed method compared to previous cost-sensitive and resampling based online learning algorithms and those that are designed for optimizing certain measures.	en_US
dc.relation.ispartof	31st AAAI Conference on Artificial Intelligence, AAAI 2017	en_US
dc.title	A framework of online learning with imbalanced streaming data	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. A challenge for mining large-scale streaming data overlooked by most existing studies on online learning is the skew-distribution of examples over different classes. Many previous works have considered cost-sensitive approaches in an online setting for streaming data, where fixed costs are assigned to different classes, or ad-hoc costs are adapted based on the distribution of data received so far. However, it is not necessary for them to achieve optimal performance in terms of the measures suited for imbalanced data, such as Fmeasure, area under ROC curve (AUROC), area under precision and recall curve (AUPRC). This work proposes a general framework for online learning with imbalanced streaming data, where examples are coming sequentially and models are updated accordingly on-the-fly. By simultaneously learning multiple classifiers with different cost vectors, the proposed method can be adopted for different target measures for imbalanced data, including F-measure, AUROC and AUPRC. Moreover, we present a rigorous theoretical justification of the proposed framework for the F-measure maximization. Our empirical studies demonstrate the competitive if not better performance of the proposed method compared to previous cost-sensitive and resampling based online learning algorithms and those that are designed for optimizing certain measures.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/125896