Data poison detection schemes for distributed machine learning

Chen, Y; Mao, Y; Liang, H; Yu, S; Wei, Y; Leng, S

Data poison detection schemes for distributed machine learning

Chen, Y Mao, Y Liang, H Yu, S

Wei, Y Leng, S

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Access, 2020, 8, pp. 7442-7454
Issue Date:: 2020-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (1.81 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Chen, Y
dc.contributor.author	Mao, Y
dc.contributor.author	Liang, H
dc.contributor.author	Yu, S https://orcid.org/0000-0003-4485-6743
dc.contributor.author	Wei, Y
dc.contributor.author	Leng, S
dc.date.accessioned	2020-11-18T06:57:55Z
dc.date.available	2020-11-18T06:57:55Z
dc.date.issued	2020-01-01
dc.identifier.citation	IEEE Access, 2020, 8, pp. 7442-7454
dc.identifier.issn	2169-3536
dc.identifier.issn	2169-3536
dc.identifier.uri	http://hdl.handle.net/10453/144129
dc.description.abstract	© 2013 IEEE. Distributed machine learning (DML) can realize massive dataset training when no single node can work out the accurate results within an acceptable time. However, this will inevitably expose more potential targets to attackers compared with the non-distributed environment. In this paper, we classify DML into basic-DML and semi-DML. In basic-DML, the center server dispatches learning tasks to distributed machines and aggregates their learning results. While in semi-DML, the center server further devotes resources into dataset learning in addition to its duty in basic-DML. We firstly put forward a novel data poison detection scheme for basic-DML, which utilizes a cross-learning mechanism to find out the poisoned data. We prove that the proposed cross-learning mechanism would generate training loops, based on which a mathematical model is established to find the optimal number of training loops. Then, for semi-DML, we present an improved data poison detection scheme to provide better learning protection with the aid of the central resource. To efficiently utilize the system resources, an optimal resource allocation approach is developed. Simulation results show that the proposed scheme can significantly improve the accuracy of the final model by up to 20% for support vector machine and 60% for logistic regression in the basic-DML scenario. Moreover, in the semi-DML scenario, the improved data poison detection scheme with optimal resource allocation can decrease the wasted resources for 20-100%.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Access
dc.relation.isbasedon	10.1109/ACCESS.2019.2962525
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 10 Technology
dc.title	Data poison detection schemes for distributed machine learning
dc.type	Journal Article
utslib.citation.volume	8
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	10 Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	open_access	*
dc.date.updated	2020-11-18T06:57:51Z
pubs.publication-status	Published
pubs.volume	8

Abstract:

© 2013 IEEE. Distributed machine learning (DML) can realize massive dataset training when no single node can work out the accurate results within an acceptable time. However, this will inevitably expose more potential targets to attackers compared with the non-distributed environment. In this paper, we classify DML into basic-DML and semi-DML. In basic-DML, the center server dispatches learning tasks to distributed machines and aggregates their learning results. While in semi-DML, the center server further devotes resources into dataset learning in addition to its duty in basic-DML. We firstly put forward a novel data poison detection scheme for basic-DML, which utilizes a cross-learning mechanism to find out the poisoned data. We prove that the proposed cross-learning mechanism would generate training loops, based on which a mathematical model is established to find the optimal number of training loops. Then, for semi-DML, we present an improved data poison detection scheme to provide better learning protection with the aid of the central resource. To efficiently utilize the system resources, an optimal resource allocation approach is developed. Simulation results show that the proposed scheme can significantly improve the accuracy of the final model by up to 20% for support vector machine and 60% for logistic regression in the basic-DML scenario. Moreover, in the semi-DML scenario, the improved data poison detection scheme with optimal resource allocation can decrease the wasted resources for 20-100%.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/144129