Context-aware Adaptive Route Mutation Scheme: A Reinforcement Learning Approach

Xu, C; Zhang, T; Kuang, X; Zhou, Z; Yu, S

Context-aware Adaptive Route Mutation Scheme: A Reinforcement Learning Approach

Xu, C Zhang, T Kuang, X Zhou, Z Yu, S

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Internet of Things Journal, 2021, PP, (99), pp. 1-1
Issue Date:: 2021-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 12 Mar 2023

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (2.98 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xu, C
dc.contributor.author	Zhang, T
dc.contributor.author	Kuang, X
dc.contributor.author	Zhou, Z
dc.contributor.author	Yu, S https://orcid.org/0000-0003-4485-6743
dc.date.accessioned	2021-04-28T07:21:38Z
dc.date.available	2021-04-28T07:21:38Z
dc.date.issued	2021-01-01
dc.identifier.citation	IEEE Internet of Things Journal, 2021, PP, (99), pp. 1-1
dc.identifier.issn	2327-4662
dc.identifier.issn	2327-4662
dc.identifier.uri	http://hdl.handle.net/10453/148485
dc.description.abstract	Moving Target Defense (MTD) is an emerging proactive defense technology, which can reduce the risk of vulnerabilities exploited by attacker. As a crucial component of MTD, route mutation (RM) faces a few fundamental problems defending against sophisticated Distributed Denial of Service (DDoS) attacks: 1) It’s unable to make optimal mutation selection due to insufficient learning in attack behaviors. 2) Because network situation is time-varying, RM also lacks self-adaptation in mutation parameters. In this paper, we propose a context-aware Q-learning algorithm for RM (CQ-RM) that can learn attack strategies to optimize the selection of mutated routes. We firstly integrate four representative attack strategies into a unified mathematical model and formalize multiple network constraints. Then, taking above network constraints into considerations, we model RM process as a Markov decision process (MDP). To look for the optimal policy of MDP, we develop a context estimation mechanism and further propose the CQ-RM scheme, which can adjust learning rate and mutation period adaptively. Correspondingly, the optimal convergence of CQ-RM is proved theoretically. Finally, extensive experimental results highlight the effectiveness of our method compared to representative solutions.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Internet of Things Journal
dc.relation.isbasedon	10.1109/JIOT.2021.3065680
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	0805 Distributed Computing, 1005 Communications Technologies
dc.title	Context-aware Adaptive Route Mutation Scheme: A Reinforcement Learning Approach
dc.type	Journal Article
utslib.citation.volume	PP
utslib.for	0805 Distributed Computing
utslib.for	1005 Communications Technologies
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2023-03-12T00:00:00+1000Z
dc.date.updated	2021-04-28T07:21:37Z
pubs.issue	99
pubs.publication-status	Published
pubs.volume	PP
utslib.citation.issue	99

Abstract:

Moving Target Defense (MTD) is an emerging proactive defense technology, which can reduce the risk of vulnerabilities exploited by attacker. As a crucial component of MTD, route mutation (RM) faces a few fundamental problems defending against sophisticated Distributed Denial of Service (DDoS) attacks: 1) It’s unable to make optimal mutation selection due to insufficient learning in attack behaviors. 2) Because network situation is time-varying, RM also lacks self-adaptation in mutation parameters. In this paper, we propose a context-aware Q-learning algorithm for RM (CQ-RM) that can learn attack strategies to optimize the selection of mutated routes. We firstly integrate four representative attack strategies into a unified mathematical model and formalize multiple network constraints. Then, taking above network constraints into considerations, we model RM process as a Markov decision process (MDP). To look for the optimal policy of MDP, we develop a context estimation mechanism and further propose the CQ-RM scheme, which can adjust learning rate and mutation period adaptively. Correspondingly, the optimal convergence of CQ-RM is proved theoretically. Finally, extensive experimental results highlight the effectiveness of our method compared to representative solutions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/148485