DOB-Net: Actively Rejecting Unknown Excessive Time-Varying Disturbances

Wang, T; Lu, W; Yan, Z; Liu, D

DOB-Net: Actively Rejecting Unknown Excessive Time-Varying Disturbances

Wang, T Lu, W Yan, Z Liu, D

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, 00, pp. 1881-1887
Issue Date:: 2020-09-15

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted ManuscriptAdobe PDF (985.25 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Wang, T
dc.contributor.author	Lu, W
dc.contributor.author	Yan, Z
dc.contributor.author	Liu, D https://orcid.org/0000-0002-1581-5582
dc.date	2020-05-31
dc.date.accessioned	2021-02-22T01:29:09Z
dc.date.available	2021-02-22T01:29:09Z
dc.date.issued	2020-09-15
dc.identifier.citation	2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, 00, pp. 1881-1887
dc.identifier.isbn	978-1-7281-7396-2
dc.identifier.issn	1050-4729
dc.identifier.uri	http://hdl.handle.net/10453/146277
dc.description.abstract	This paper presents an observer-integrated Reinforcement Learning (RL) approach, called Disturbance OB-server Network (DOB-Net), for robots operating in environments where disturbances are unknown and time-varying, and may frequently exceed robot control capabilities. The DOB-Net integrates a disturbance dynamics observer network and a controller network. Originated from conventional DOB mechanisms, the observer is built and enhanced via Recurrent Neural Networks (RNNs), encoding estimation of past values and prediction of future values of unknown disturbances in RNN hidden state. Such encoding allows the controller generate optimal control signals to actively reject disturbances, under the constraints of robot control capabilities. The observer and the controller are jointly learned within policy optimization by advantage actor critic. Numerical simulations on position regulation tasks have demonstrated that the proposed DOB-Net significantly outperforms conventional feedback controllers and classical RL policy.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2020 IEEE International Conference on Robotics and Automation (ICRA)
dc.relation.ispartof	2020 IEEE International Conference on Robotics and Automation
dc.relation.isbasedon	10.1109/icra40945.2020.9196641
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.title	DOB-Net: Actively Rejecting Unknown Excessive Time-Varying Disturbances
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Paris, France
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAS - Centre for Autonomous Systems
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Mechanical and Mechatronic Engineering
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	open_access	*
dc.date.updated	2021-02-22T01:28:57Z
pubs.finish-date	2020-08-31
pubs.publication-status	Published
pubs.start-date	2020-05-31
pubs.volume	00

Abstract:

This paper presents an observer-integrated Reinforcement Learning (RL) approach, called Disturbance OB-server Network (DOB-Net), for robots operating in environments where disturbances are unknown and time-varying, and may frequently exceed robot control capabilities. The DOB-Net integrates a disturbance dynamics observer network and a controller network. Originated from conventional DOB mechanisms, the observer is built and enhanced via Recurrent Neural Networks (RNNs), encoding estimation of past values and prediction of future values of unknown disturbances in RNN hidden state. Such encoding allows the controller generate optimal control signals to actively reject disturbances, under the constraints of robot control capabilities. The observer and the controller are jointly learned within policy optimization by advantage actor critic. Numerical simulations on position regulation tasks have demonstrated that the proposed DOB-Net significantly outperforms conventional feedback controllers and classical RL policy.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/146277