TY - GEN
T1 - TwoStreamVAN
T2 - 2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020
AU - Sun, Ximeng
AU - Xu, Huijuan
AU - Saenko, Kate
N1 - Funding Information:
This work is supported by DARPA LWLL Project. It reflects the opinions and conclusions of its authors, but not the funding agents.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/3
Y1 - 2020/3
N2 - Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content. Existing methods entangle the two intrinsically different tasks of motion and content creation in a single generator network, but this approach struggles to simultaneously generate plausible motion and content. To improve motion modeling in video generation task, we propose a two-stream model that disentangles motion generation from content generation, called a Two-Stream Variational Adversarial Network (TwoStreamVAN). Given an action label and a noise vector, our model is able to create clear and consistent motion, and thus yields photorealistic videos. The key idea is to progressively generate and fuse multi-scale motion with its corresponding spatial content. Our model significantly outperforms existing methods on the standard Weizmann Human Action, MUG Facial Expression and VoxCeleb datasets, as well as our new dataset of diverse human actions with challenging and complex motion. Our code is available at https://github.com/sunxm2357/TwoStreamVAN/.
AB - Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content. Existing methods entangle the two intrinsically different tasks of motion and content creation in a single generator network, but this approach struggles to simultaneously generate plausible motion and content. To improve motion modeling in video generation task, we propose a two-stream model that disentangles motion generation from content generation, called a Two-Stream Variational Adversarial Network (TwoStreamVAN). Given an action label and a noise vector, our model is able to create clear and consistent motion, and thus yields photorealistic videos. The key idea is to progressively generate and fuse multi-scale motion with its corresponding spatial content. Our model significantly outperforms existing methods on the standard Weizmann Human Action, MUG Facial Expression and VoxCeleb datasets, as well as our new dataset of diverse human actions with challenging and complex motion. Our code is available at https://github.com/sunxm2357/TwoStreamVAN/.
UR - http://www.scopus.com/inward/record.url?scp=85085515425&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085515425&partnerID=8YFLogxK
U2 - 10.1109/WACV45572.2020.9093557
DO - 10.1109/WACV45572.2020.9093557
M3 - Conference contribution
AN - SCOPUS:85085515425
T3 - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
SP - 2733
EP - 2742
BT - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 March 2020 through 5 March 2020
ER -