A²: Extracting cyclic switchings from DOB-nets for rejecting excessive disturbances: A²: Extracting cyclic switchings from DOB-nets for rejecting excessive disturbances

Lu, W; Liu, D

A²: Extracting cyclic switchings from DOB-nets for rejecting excessive disturbances: A²: Extracting cyclic switchings from DOB-nets for rejecting excessive disturbances

Lu, W Liu, D

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Neurocomputing, 2020, 400, pp. 161-172
Issue Date:: 2020-08-04

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 1 Aug 2022

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (3.18 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Lu, W
dc.contributor.author	Liu, D https://orcid.org/0000-0002-1581-5582
dc.date.accessioned	2020-10-26T19:37:13Z
dc.date.available	2020-10-26T19:37:13Z
dc.date.issued	2020-08-04
dc.identifier.citation	Neurocomputing, 2020, 400, pp. 161-172
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/143528
dc.description.abstract	© 2020 Reinforcement Learning (RL) is limited in practice by its poor explainability, which is responsible for insufficient trustiness from users, unsatisfied interpretation for human intervention, inadequate analysis for future improvement, etc. This paper seeks to partially characterize the interplay between dynamical environments and a previously-proposed Disturbance OBserver net (DOB-net). The DOB-net is trained via RL and offers optimal control for a set of Partially Observable Markovian Decision Processes (POMDPs). The transition function of each POMDP is largely determined by the environments (excessive external disturbances). This paper proposes an Attention-based Abstraction (A2) approach to extract a finite-state automaton, referred to as a Key Moore Machine Network (KMMN), to capture the switching mechanisms exhibited by the DOB-net in dealing with multiple such POMDPs. A2 first quantizes the controlled platform by learning continuous-discrete interfaces. Then it extracts the KMMN by finding the key hidden states and transitions that attract sufficient attention from the DOB-net. Within the resultant KMMN, three patterns of cyclic switchings (between key hidden states) are found, and saturated controls are shown synchronized with unknown disturbances. Interestingly, the found switchings have previously appeared in the control design for often-saturated systems. They are interpreted via an analogy to the discrete-event subsystem of hybrid control.
dc.language	en
dc.publisher	Elsevier
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2020.03.014
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	A²: Extracting cyclic switchings from DOB-nets for rejecting excessive disturbances: A²: Extracting cyclic switchings from DOB-nets for rejecting excessive disturbances
dc.type	Journal Article
utslib.citation.volume	400
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAS - Centre for Autonomous Systems
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Mechanical and Mechatronic Engineering
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
utslib.copyright.embargo	2022-08-01T00:00:00+1000Z
dc.date.updated	2020-10-26T19:37:09Z
pubs.publication-status	Published
pubs.volume	400

Abstract:

© 2020 Reinforcement Learning (RL) is limited in practice by its poor explainability, which is responsible for insufficient trustiness from users, unsatisfied interpretation for human intervention, inadequate analysis for future improvement, etc. This paper seeks to partially characterize the interplay between dynamical environments and a previously-proposed Disturbance OBserver net (DOB-net). The DOB-net is trained via RL and offers optimal control for a set of Partially Observable Markovian Decision Processes (POMDPs). The transition function of each POMDP is largely determined by the environments (excessive external disturbances). This paper proposes an Attention-based Abstraction (A2) approach to extract a finite-state automaton, referred to as a Key Moore Machine Network (KMMN), to capture the switching mechanisms exhibited by the DOB-net in dealing with multiple such POMDPs. A2 first quantizes the controlled platform by learning continuous-discrete interfaces. Then it extracts the KMMN by finding the key hidden states and transitions that attract sufficient attention from the DOB-net. Within the resultant KMMN, three patterns of cyclic switchings (between key hidden states) are found, and saturated controls are shown synchronized with unknown disturbances. Interestingly, the found switchings have previously appeared in the control design for often-saturated systems. They are interpreted via an analogy to the discrete-event subsystem of hybrid control.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/143528