A Reinforcement Learning-Based Vehicle Platoon Control Strategy for Reducing Energy Consumption in Traffic Oscillations.

Li, M; Cao, Z; Li, Z

A Reinforcement Learning-Based Vehicle Platoon Control Strategy for Reducing Energy Consumption in Traffic Oscillations.

Li, M Cao, Z

Li, Z

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Trans Neural Netw Learn Syst, 2021, 32, (12), pp. 5309-5322
Issue Date:: 2021-12

Closed Access

	Filename	Description	Size
	A_Reinforcement_Learning-Based_Vehicle_Platoon_Control_Strategy_for_Reducing_Energy_Consumption_in_Traffic_Oscillations.pdf	Published version	4.24 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, M
dc.contributor.author	Cao, Z https://orcid.org/0000-0003-3656-0328
dc.contributor.author	Li, Z
dc.date.accessioned	2022-02-26T07:28:28Z
dc.date.available	2022-02-26T07:28:28Z
dc.date.issued	2021-12
dc.identifier.citation	IEEE Trans Neural Netw Learn Syst, 2021, 32, (12), pp. 5309-5322
dc.identifier.issn	2162-237X
dc.identifier.issn	2162-2388
dc.identifier.uri	http://hdl.handle.net/10453/154882
dc.description.abstract	The vehicle platoon will be the most dominant driving mode on future roads. To the best of our knowledge, few reinforcement learning (RL) algorithms have been applied in vehicle platoon control, which has large-scale action and state spaces. Some RL-based methods were applied to solve single-agent problems. If we need to tackle multiagent problems, we will use multiagent RL algorithms since the parameters space grows exponentially with the increasing number of agents involved. Previous multiagent RL algorithms generally may provide redundant information to agents, indicating a large amount of useless or unrelated information, which may cause to be difficult for convergence training and pattern extractions from shared information. Also, random actions usually contribute to crashes, especially at the beginning of training. In this study, a communication proximal policy optimization (CommPPO) algorithm was proposed to tackle the above issues. In specific, the CommPPO model adopts a parameter-sharing structure to allow the dynamic variation of agent numbers, which can well handle various platoon dynamics, including splitting and merging. The communication protocol of the CommPPO consists of two parts. In the state part, the widely used predecessor-leader follower typology in the platoon is adopted to transmit global and local state information to agents. In the reward part, a new reward communication channel is proposed to solve the spurious reward and "lazy agent" problems in some existing multiagent RLs. Moreover, a curriculum learning approach is adopted to reduce crashes and speed up training. To validate the proposed strategy for platoon control, two existing multiagent RLs and a traditional platoon control strategy were applied in the same scenarios for comparison. Results showed that the CommPPO algorithm gained more rewards and achieved the largest fuel consumption reduction (11.6%).
dc.format	Print-Electronic
dc.language	eng
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof	IEEE Trans Neural Netw Learn Syst
dc.relation.isbasedon	10.1109/TNNLS.2021.3071959
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	A Reinforcement Learning-Based Vehicle Platoon Control Strategy for Reducing Energy Consumption in Traffic Oscillations.
dc.type	Journal Article
utslib.citation.volume	32
utslib.location.activity	United States
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2022-02-26T07:28:24Z
pubs.issue	12
pubs.publication-status	Published
pubs.volume	32
utslib.citation.issue	12

Abstract:

The vehicle platoon will be the most dominant driving mode on future roads. To the best of our knowledge, few reinforcement learning (RL) algorithms have been applied in vehicle platoon control, which has large-scale action and state spaces. Some RL-based methods were applied to solve single-agent problems. If we need to tackle multiagent problems, we will use multiagent RL algorithms since the parameters space grows exponentially with the increasing number of agents involved. Previous multiagent RL algorithms generally may provide redundant information to agents, indicating a large amount of useless or unrelated information, which may cause to be difficult for convergence training and pattern extractions from shared information. Also, random actions usually contribute to crashes, especially at the beginning of training. In this study, a communication proximal policy optimization (CommPPO) algorithm was proposed to tackle the above issues. In specific, the CommPPO model adopts a parameter-sharing structure to allow the dynamic variation of agent numbers, which can well handle various platoon dynamics, including splitting and merging. The communication protocol of the CommPPO consists of two parts. In the state part, the widely used predecessor-leader follower typology in the platoon is adopted to transmit global and local state information to agents. In the reward part, a new reward communication channel is proposed to solve the spurious reward and "lazy agent" problems in some existing multiagent RLs. Moreover, a curriculum learning approach is adopted to reduce crashes and speed up training. To validate the proposed strategy for platoon control, two existing multiagent RLs and a traditional platoon control strategy were applied in the same scenarios for comparison. Results showed that the CommPPO algorithm gained more rewards and achieved the largest fuel consumption reduction (11.6%).

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/154882