Stretch-VST: Getting flexible with visual stories

Chi Yang Hsu, Yun Wei Chu, Tsai Lun Yang, Ting Hao Huang, Lun Wei Ku

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

In visual storytelling, a short story is generated based on a given image sequence. Despite years of work, most visual storytelling models remain limited in terms of the generated stories’ fixed length: most models produce stories with exactly five sentences because five sentence stories dominate the training data. The fix-length stories carry limited details and provide ambiguous textual information to the readers. Therefore, we propose to “stretch” the stories, which create the potential to present in-depth visual details. This paper presents Stretch-VST, a visual storytelling framework that enables the generation of prolonged stories by adding appropriate knowledge, which is selected by the proposed scoring function. We propose a length-controlled Transformer to generate long stories. This model introduces novel positional encoding methods to maintain story quality with lengthy inputs. Experiments confirm that long stories are generated without deteriorating the quality. The human evaluation further shows that Stretch-VST can provide better focus and detail when stories are prolonged compared to the state of the art. The demo video is available on Youtube1, and the live demo can be found on website2

Original languageEnglish (US)
Title of host publicationACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the System Demonstrations
PublisherAssociation for Computational Linguistics (ACL)
Pages356-362
Number of pages7
ISBN (Electronic)9781954085565
StatePublished - 2021
EventJoint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 - Virtual, Online
Duration: Aug 1 2021Aug 6 2021

Publication series

NameACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the System Demonstrations

Conference

ConferenceJoint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021
CityVirtual, Online
Period8/1/218/6/21

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Software
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Stretch-VST: Getting flexible with visual stories'. Together they form a unique fingerprint.

Cite this