A learning-based text synthesis engine for scene text detection

Xiao Yang, Dafang He, Daniel Kifer, C. Lee Giles

Research output: Contribution to conferencePaperpeer-review


Scene text detection (STD) and recognition (STR) methods have recently greatly improved with the use of synthetic training data playing an important role. That being said, for text detection task the performance of a model that is trained sorely on large-scale synthetic data is significantly worse than one trained on a few real-world data samples. However, state-of-the-art performance on text recognition can be achieved by only training on synthetic data [10]. This shows the limitations in only using large-scale synthetic data for scene text detection. In this work, we propose the first learning-based, data-driven text synthesis engine for scene text detection task. Our text synthesis engine is decomposed into two modules: 1) a location module that learns the distribution of text locations on the image plane, and 2) an appearance module that translates the text-inserted images to realistic-looking ones that are essentially indistinguishable from real-world scene text images. Evaluation of our created synthetic data on ICDAR 2015 Incidental Scene Text dataset [15] outperforms previous text synthesis methods.

Original languageEnglish (US)
StatePublished - 2020
Event30th British Machine Vision Conference, BMVC 2019 - Cardiff, United Kingdom
Duration: Sep 9 2019Sep 12 2019


Conference30th British Machine Vision Conference, BMVC 2019
Country/TerritoryUnited Kingdom

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition


Dive into the research topics of 'A learning-based text synthesis engine for scene text detection'. Together they form a unique fingerprint.

Cite this