Large scale scene text verification with guided attention

Dafang He, Yeqing Li, Alexander Gorban, Derrall Heath, Julian Ibarz, Qian Yu, Daniel Kifer, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many tasks are related to determining if a particular text string exists in an image. In this work, we propose a new framework that learns this task in an end-to-end way. The framework takes an image and a text string as input and then outputs the probability of the text string being present in the image. This is the first end-to-end framework that learns such relationships between text and images in scene text area. The framework does not require explicit scene text detection or recognition and thus no bounding box annotations are needed. It is also the first work in scene text area that tackles such a weakly labeled problem. Based on this framework, we developed a model called Guided Attention. Our designed model achieves better results than several state-of-the-art scene text reading based solutions for a challenging Street View Business Matching task. The task tries to find correct business names for storefront images and the dataset we collected for it is substantially larger, and more challenging than existing scene text dataset. This new real-world task provides a new perspective for studying scene text related problems.

Original languageEnglish (US)
Title of host publicationComputer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers
EditorsC.V. Jawahar, Konrad Schindler, Greg Mori, Hongdong Li
PublisherSpringer Verlag
Pages260-275
Number of pages16
ISBN (Print)9783030208721
DOIs
StatePublished - Jan 1 2019
Event14th Asian Conference on Computer Vision, ACCV 2018 - Perth, Australia
Duration: Dec 2 2018Dec 6 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11365 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th Asian Conference on Computer Vision, ACCV 2018
CountryAustralia
CityPerth
Period12/2/1812/6/18

Fingerprint

Industry
Strings
Text
Annotation
Framework
Output
Model
Business

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

He, D., Li, Y., Gorban, A., Heath, D., Ibarz, J., Yu, Q., ... Giles, C. L. (2019). Large scale scene text verification with guided attention. In C. V. Jawahar, K. Schindler, G. Mori, & H. Li (Eds.), Computer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers (pp. 260-275). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11365 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-20873-8_17
He, Dafang ; Li, Yeqing ; Gorban, Alexander ; Heath, Derrall ; Ibarz, Julian ; Yu, Qian ; Kifer, Daniel ; Giles, C. Lee. / Large scale scene text verification with guided attention. Computer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers. editor / C.V. Jawahar ; Konrad Schindler ; Greg Mori ; Hongdong Li. Springer Verlag, 2019. pp. 260-275 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{4b82fd378f3b46cab955262fd3dab5e2,
title = "Large scale scene text verification with guided attention",
abstract = "Many tasks are related to determining if a particular text string exists in an image. In this work, we propose a new framework that learns this task in an end-to-end way. The framework takes an image and a text string as input and then outputs the probability of the text string being present in the image. This is the first end-to-end framework that learns such relationships between text and images in scene text area. The framework does not require explicit scene text detection or recognition and thus no bounding box annotations are needed. It is also the first work in scene text area that tackles such a weakly labeled problem. Based on this framework, we developed a model called Guided Attention. Our designed model achieves better results than several state-of-the-art scene text reading based solutions for a challenging Street View Business Matching task. The task tries to find correct business names for storefront images and the dataset we collected for it is substantially larger, and more challenging than existing scene text dataset. This new real-world task provides a new perspective for studying scene text related problems.",
author = "Dafang He and Yeqing Li and Alexander Gorban and Derrall Heath and Julian Ibarz and Qian Yu and Daniel Kifer and Giles, {C. Lee}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-20873-8_17",
language = "English (US)",
isbn = "9783030208721",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "260--275",
editor = "C.V. Jawahar and Konrad Schindler and Greg Mori and Hongdong Li",
booktitle = "Computer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers",
address = "Germany",

}

He, D, Li, Y, Gorban, A, Heath, D, Ibarz, J, Yu, Q, Kifer, D & Giles, CL 2019, Large scale scene text verification with guided attention. in CV Jawahar, K Schindler, G Mori & H Li (eds), Computer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11365 LNCS, Springer Verlag, pp. 260-275, 14th Asian Conference on Computer Vision, ACCV 2018, Perth, Australia, 12/2/18. https://doi.org/10.1007/978-3-030-20873-8_17

Large scale scene text verification with guided attention. / He, Dafang; Li, Yeqing; Gorban, Alexander; Heath, Derrall; Ibarz, Julian; Yu, Qian; Kifer, Daniel; Giles, C. Lee.

Computer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers. ed. / C.V. Jawahar; Konrad Schindler; Greg Mori; Hongdong Li. Springer Verlag, 2019. p. 260-275 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11365 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Large scale scene text verification with guided attention

AU - He, Dafang

AU - Li, Yeqing

AU - Gorban, Alexander

AU - Heath, Derrall

AU - Ibarz, Julian

AU - Yu, Qian

AU - Kifer, Daniel

AU - Giles, C. Lee

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Many tasks are related to determining if a particular text string exists in an image. In this work, we propose a new framework that learns this task in an end-to-end way. The framework takes an image and a text string as input and then outputs the probability of the text string being present in the image. This is the first end-to-end framework that learns such relationships between text and images in scene text area. The framework does not require explicit scene text detection or recognition and thus no bounding box annotations are needed. It is also the first work in scene text area that tackles such a weakly labeled problem. Based on this framework, we developed a model called Guided Attention. Our designed model achieves better results than several state-of-the-art scene text reading based solutions for a challenging Street View Business Matching task. The task tries to find correct business names for storefront images and the dataset we collected for it is substantially larger, and more challenging than existing scene text dataset. This new real-world task provides a new perspective for studying scene text related problems.

AB - Many tasks are related to determining if a particular text string exists in an image. In this work, we propose a new framework that learns this task in an end-to-end way. The framework takes an image and a text string as input and then outputs the probability of the text string being present in the image. This is the first end-to-end framework that learns such relationships between text and images in scene text area. The framework does not require explicit scene text detection or recognition and thus no bounding box annotations are needed. It is also the first work in scene text area that tackles such a weakly labeled problem. Based on this framework, we developed a model called Guided Attention. Our designed model achieves better results than several state-of-the-art scene text reading based solutions for a challenging Street View Business Matching task. The task tries to find correct business names for storefront images and the dataset we collected for it is substantially larger, and more challenging than existing scene text dataset. This new real-world task provides a new perspective for studying scene text related problems.

UR - http://www.scopus.com/inward/record.url?scp=85066782984&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066782984&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-20873-8_17

DO - 10.1007/978-3-030-20873-8_17

M3 - Conference contribution

AN - SCOPUS:85066782984

SN - 9783030208721

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 260

EP - 275

BT - Computer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers

A2 - Jawahar, C.V.

A2 - Schindler, Konrad

A2 - Mori, Greg

A2 - Li, Hongdong

PB - Springer Verlag

ER -

He D, Li Y, Gorban A, Heath D, Ibarz J, Yu Q et al. Large scale scene text verification with guided attention. In Jawahar CV, Schindler K, Mori G, Li H, editors, Computer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers. Springer Verlag. 2019. p. 260-275. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-20873-8_17