Automatic Selection of Security Service Function Chaining Using Reinforcement Learning

Li, G; Zhou, H; Feng, B; Yu, S

Automatic Selection of Security Service Function Chaining Using Reinforcement Learning

Li, G Zhou, H Feng, B Yu, S

Permalink

Publication Type:: Conference Proceeding
Citation:: 2018 IEEE Globecom Workshops, GC Wkshps 2018 - Proceedings, 2019
Issue Date:: 2019-02-19

Closed Access

	Filename	Description	Size
	Automatic Selection of Security Service Function Chaining Using Reinforcement Learning.pdf	Published version	314.61 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, G	en_US
dc.contributor.author	Zhou, H	en_US
dc.contributor.author	Feng, B	en_US
dc.contributor.author	Yu, S https://orcid.org/0000-0003-4485-6743	en_US
dc.date.issued	2019-02-19	en_US
dc.identifier.citation	2018 IEEE Globecom Workshops, GC Wkshps 2018 - Proceedings, 2019	en_US
dc.identifier.isbn	9781538649206	en_US
dc.identifier.uri	http://hdl.handle.net/10453/134523
dc.description.abstract	© 2018 IEEE. When selecting security Service Function Chaining (SFC) for network defense, operators usually take security performance, service quality, deployment cost, and network function diversity into consideration, formulating as a multi-objective optimization problem. However, as applications, users, and data volumes grow massively in networks, traditional mathematical approaches cannot be applied to online security SFC selections due to high execution time and uncertainty of network conditions. Thus, in this paper, we utilize reinforcement learning, specifically, the Q-learning algorithm to automatically choose proper security SFC for various requirements. Particularly, we design a reward function to make a tradeoff among different objectives and modify the standard -greedy based exploration to pick out multiple ranked actions for diversified network defense. We compare the Q-learning with mathematical optimization-based approaches, which are assumed to know network state changes in advance. The training and testing results show that the Q-learning based approach can capture changes of network conditions and make a tradeoff among different objectives.	en_US
dc.relation.ispartof	2018 IEEE Globecom Workshops, GC Wkshps 2018 - Proceedings	en_US
dc.relation.isbasedon	10.1109/GLOCOMW.2018.8644122	en_US
dc.title	Automatic Selection of Security Service Function Chaining Using Reinforcement Learning	en_US
dc.type	Conference Proceeding
utslib.for	0803 Computer Software	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2018 IEEE. When selecting security Service Function Chaining (SFC) for network defense, operators usually take security performance, service quality, deployment cost, and network function diversity into consideration, formulating as a multi-objective optimization problem. However, as applications, users, and data volumes grow massively in networks, traditional mathematical approaches cannot be applied to online security SFC selections due to high execution time and uncertainty of network conditions. Thus, in this paper, we utilize reinforcement learning, specifically, the Q-learning algorithm to automatically choose proper security SFC for various requirements. Particularly, we design a reward function to make a tradeoff among different objectives and modify the standard -greedy based exploration to pick out multiple ranked actions for diversified network defense. We compare the Q-learning with mathematical optimization-based approaches, which are assumed to know network state changes in advance. The training and testing results show that the Q-learning based approach can capture changes of network conditions and make a tradeoff among different objectives.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/134523