Taylor saves for later: Disentanglement for video prediction using Taylor representation

Pan, T; Jiang, Z; Han, J; Wen, S; Men, A; Wang, H

Taylor saves for later: Disentanglement for video prediction using Taylor representation

Pan, T Jiang, Z Han, J Wen, S

Men, A Wang, H

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Neurocomputing, 2022, 472, pp. 166-174
Issue Date:: 2022-02-01

Closed Access

	Filename	Description	Size
	Taylor saves for later Disentanglement for video prediction using Taylor representation.pdf	Published version	1.72 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Pan, T
dc.contributor.author	Jiang, Z
dc.contributor.author	Han, J
dc.contributor.author	Wen, S https://orcid.org/0000-0001-8077-7001
dc.contributor.author	Men, A
dc.contributor.author	Wang, H
dc.date.accessioned	2023-02-28T03:51:41Z
dc.date.available	2023-02-28T03:51:41Z
dc.date.issued	2022-02-01
dc.identifier.citation	Neurocomputing, 2022, 472, pp. 166-174
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/166538
dc.description.abstract	Video prediction is a challenging task with wide application prospects in meteorology and robot systems. Existing works fail to trade off short-term and long-term prediction performances and extract robust latent dynamics laws in video frames. We propose a two-branch seq-to-seq deep model to disentangle the Taylor feature and the residual feature in video frames by a novel recurrent prediction module (TaylorCell) and residual module, based on a novel principle for feature separation. TaylorCell can expand the video frames’ high-dimensional features into the finite Taylor series to describe the latent laws. In TaylorCell, we propose the Taylor prediction unit (TPU) and the memory correction unit (MCU). TPU employs the first input frame's derivative information to predict the future frames, avoiding error accumulation. MCU distills all past frames’ information to correct the predicted Taylor feature from TPU. Correspondingly, the residual module extracts the residual feature complementary to the Taylor feature. Due to the characteristic of the Taylor series, our model works better on datasets with short-range spatial dependencies and stable dynamics. On three generalist datasets (Moving MNIST, TaxiBJ, Human 3.6), our model reaches and outperforms the state-of-the-art model in the short-term and long-term forecast, respectively. Ablation experiments demonstrate the contributions of each module in our model.
dc.language	en
dc.publisher	Elsevier
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2021.11.021
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Taylor saves for later: Disentanglement for video prediction using Taylor representation
dc.type	Journal Article
utslib.citation.volume	472
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	closed_access	*
dc.date.updated	2023-02-28T03:51:40Z
pubs.publication-status	Published
pubs.volume	472

Abstract:

Video prediction is a challenging task with wide application prospects in meteorology and robot systems. Existing works fail to trade off short-term and long-term prediction performances and extract robust latent dynamics laws in video frames. We propose a two-branch seq-to-seq deep model to disentangle the Taylor feature and the residual feature in video frames by a novel recurrent prediction module (TaylorCell) and residual module, based on a novel principle for feature separation. TaylorCell can expand the video frames’ high-dimensional features into the finite Taylor series to describe the latent laws. In TaylorCell, we propose the Taylor prediction unit (TPU) and the memory correction unit (MCU). TPU employs the first input frame's derivative information to predict the future frames, avoiding error accumulation. MCU distills all past frames’ information to correct the predicted Taylor feature from TPU. Correspondingly, the residual module extracts the residual feature complementary to the Taylor feature. Due to the characteristic of the Taylor series, our model works better on datasets with short-range spatial dependencies and stable dynamics. On three generalist datasets (Moving MNIST, TaxiBJ, Human 3.6), our model reaches and outperforms the state-of-the-art model in the short-term and long-term forecast, respectively. Ablation experiments demonstrate the contributions of each module in our model.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/166538