Generating Realistic Videos From Keyframes With Concatenated GANs

Wen, S; Liu, W; Yang, Y; Huang, T; Zeng, Z

Generating Realistic Videos From Keyframes With Concatenated GANs

Wen, S

Liu, W Yang, Y Huang, T Zeng, Z

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29, (8), pp. 2337-2348
Issue Date:: 2019-08-01

Closed Access

	Filename	Description	Size
	Generating_Realistic_Videos_From_Keyframes_With_Concatenated_GANs.pdf	Published version	3.98 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wen, S https://orcid.org/0000-0001-8077-7001
dc.contributor.author	Liu, W
dc.contributor.author	Yang, Y
dc.contributor.author	Huang, T
dc.contributor.author	Zeng, Z
dc.date.accessioned	2022-07-18T05:25:59Z
dc.date.available	2022-07-18T05:25:59Z
dc.date.issued	2019-08-01
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29, (8), pp. 2337-2348
dc.identifier.issn	1051-8215
dc.identifier.issn	1558-2205
dc.identifier.uri	http://hdl.handle.net/10453/158999
dc.description.abstract	Given two video frames X0 and Xn+1, we aim to generate a series of intermediate frames Y1, Y2, . . ., Yn, such that the resulting video consisting of frames X0, Y1 − Yn, andXn+1 appears realistic to a human watcher. Such video generation has numerous important applications, including video compression, movie production, slow-motion filming, video surveillance, and forensic analysis. Yet, video generation is highly challenging due to the vast search space of possible frames. Previous methods, mostly based on video prediction and/or video interpolation, tend to generate poor-quality videos with severe motion blur. This paper proposes a novel, end-to-end approach to video generation using generative adversarial networks (GANs). In particular, our design involves two concatenated GANs, one capturing motions and the other generating frame details. The loss function is also carefully engineered to include adversarial loss, gradient difference (for motion learning), and normalized product correlation loss (for frame details). Experiments using three video datasets, namely, Google Robotic Push, KTH human actions, and UCF101, demonstrate that the proposed solution generates high-quality, realistic, and sharp videos, whereas all previous solutions output noisy and blurry results.
dc.language	English
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation	National Natural Science Foundation of China61673187
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology
dc.relation.isbasedon	10.1109/TCSVT.2018.2867934
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Generating Realistic Videos From Keyframes With Concatenated GANs
dc.type	Journal Article
utslib.citation.volume	29
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	closed_access	*
dc.date.updated	2022-07-18T05:25:55Z
pubs.issue	8
pubs.publication-status	Published
pubs.volume	29
utslib.citation.issue	8

Abstract:

Given two video frames X0 and Xn+1, we aim to generate a series of intermediate frames Y1, Y2, . . ., Yn, such that the resulting video consisting of frames X0, Y1 − Yn, andXn+1 appears realistic to a human watcher. Such video generation has numerous important applications, including video compression, movie production, slow-motion filming, video surveillance, and forensic analysis. Yet, video generation is highly challenging due to the vast search space of possible frames. Previous methods, mostly based on video prediction and/or video interpolation, tend to generate poor-quality videos with severe motion blur. This paper proposes a novel, end-to-end approach to video generation using generative adversarial networks (GANs). In particular, our design involves two concatenated GANs, one capturing motions and the other generating frame details. The loss function is also carefully engineered to include adversarial loss, gradient difference (for motion learning), and normalized product correlation loss (for frame details). Experiments using three video datasets, namely, Google Robotic Push, KTH human actions, and UCF101, demonstrate that the proposed solution generates high-quality, realistic, and sharp videos, whereas all previous solutions output noisy and blurry results.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/158999