6 Citations (Scopus)

Abstract

Scene text reading continues to be of interest for many reasons including applications for the visually impaired and automatic image indexing systems. Here we propose a novel end-to-end scene text detection algorithm. First, for identifying text regions we design a novel Convolutional Neural Network (CNN) architecture that aggregates local surrounding information for cascaded, fast and accurate detection. The local information serves as context and provides rich cues to distinguish text from background noises. In addition, we designed a novel grouping algorithm on top of detected character graph as well as a text line refinement step. Text line refinement consists of a text line extension module, together with a text line filtering and regression module. Jointly they produce accurate oriented text line bounding box. Experiments show that our method achieved state-of-the-art performance in several benchmark data sets: ICDAR 2003 (IC03), ICDAR 2013 (IC13) and Street View Text (SVT).

Original languageEnglish (US)
Title of host publicationComputer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers
EditorsKo Nishino, Shang-Hong Lai, Vincent Lepetit, Yoichi Sato
PublisherSpringer Verlag
Pages280-296
Number of pages17
ISBN (Print)9783319541921
DOIs
StatePublished - Jan 1 2017
Event13th Asian Conference on Computer Vision, ACCV 2016 - Taipei, Taiwan, Province of China
Duration: Nov 20 2016Nov 24 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10115 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th Asian Conference on Computer Vision, ACCV 2016
CountryTaiwan, Province of China
City Taipei
Period11/20/1611/24/16

Fingerprint

Network architecture
Neural networks
Line
Experiments
Refinement
Context
Text
Image Indexing
Visually Impaired
Module
Network Architecture
Grouping
Continue
Filtering
Regression
Neural Networks
Benchmark
Graph in graph theory
Experiment

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

He, D., Yang, X., Huang, W., Zhou, Z., Kifer, D., & Giles, C. L. (2017). Aggregating local context for accurate scene text detection. In K. Nishino, S-H. Lai, V. Lepetit, & Y. Sato (Eds.), Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers (pp. 280-296). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10115 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-54193-8_18
He, Dafang ; Yang, Xiao ; Huang, Wenyi ; Zhou, Zihan ; Kifer, Daniel ; Giles, C. Lee. / Aggregating local context for accurate scene text detection. Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers. editor / Ko Nishino ; Shang-Hong Lai ; Vincent Lepetit ; Yoichi Sato. Springer Verlag, 2017. pp. 280-296 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{726bfee88fdc4ab1b7bd2c60a3ef412d,
title = "Aggregating local context for accurate scene text detection",
abstract = "Scene text reading continues to be of interest for many reasons including applications for the visually impaired and automatic image indexing systems. Here we propose a novel end-to-end scene text detection algorithm. First, for identifying text regions we design a novel Convolutional Neural Network (CNN) architecture that aggregates local surrounding information for cascaded, fast and accurate detection. The local information serves as context and provides rich cues to distinguish text from background noises. In addition, we designed a novel grouping algorithm on top of detected character graph as well as a text line refinement step. Text line refinement consists of a text line extension module, together with a text line filtering and regression module. Jointly they produce accurate oriented text line bounding box. Experiments show that our method achieved state-of-the-art performance in several benchmark data sets: ICDAR 2003 (IC03), ICDAR 2013 (IC13) and Street View Text (SVT).",
author = "Dafang He and Xiao Yang and Wenyi Huang and Zihan Zhou and Daniel Kifer and Giles, {C. Lee}",
year = "2017",
month = "1",
day = "1",
doi = "10.1007/978-3-319-54193-8_18",
language = "English (US)",
isbn = "9783319541921",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "280--296",
editor = "Ko Nishino and Shang-Hong Lai and Vincent Lepetit and Yoichi Sato",
booktitle = "Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers",
address = "Germany",

}

He, D, Yang, X, Huang, W, Zhou, Z, Kifer, D & Giles, CL 2017, Aggregating local context for accurate scene text detection. in K Nishino, S-H Lai, V Lepetit & Y Sato (eds), Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10115 LNCS, Springer Verlag, pp. 280-296, 13th Asian Conference on Computer Vision, ACCV 2016, Taipei, Taiwan, Province of China, 11/20/16. https://doi.org/10.1007/978-3-319-54193-8_18

Aggregating local context for accurate scene text detection. / He, Dafang; Yang, Xiao; Huang, Wenyi; Zhou, Zihan; Kifer, Daniel; Giles, C. Lee.

Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers. ed. / Ko Nishino; Shang-Hong Lai; Vincent Lepetit; Yoichi Sato. Springer Verlag, 2017. p. 280-296 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10115 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Aggregating local context for accurate scene text detection

AU - He, Dafang

AU - Yang, Xiao

AU - Huang, Wenyi

AU - Zhou, Zihan

AU - Kifer, Daniel

AU - Giles, C. Lee

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Scene text reading continues to be of interest for many reasons including applications for the visually impaired and automatic image indexing systems. Here we propose a novel end-to-end scene text detection algorithm. First, for identifying text regions we design a novel Convolutional Neural Network (CNN) architecture that aggregates local surrounding information for cascaded, fast and accurate detection. The local information serves as context and provides rich cues to distinguish text from background noises. In addition, we designed a novel grouping algorithm on top of detected character graph as well as a text line refinement step. Text line refinement consists of a text line extension module, together with a text line filtering and regression module. Jointly they produce accurate oriented text line bounding box. Experiments show that our method achieved state-of-the-art performance in several benchmark data sets: ICDAR 2003 (IC03), ICDAR 2013 (IC13) and Street View Text (SVT).

AB - Scene text reading continues to be of interest for many reasons including applications for the visually impaired and automatic image indexing systems. Here we propose a novel end-to-end scene text detection algorithm. First, for identifying text regions we design a novel Convolutional Neural Network (CNN) architecture that aggregates local surrounding information for cascaded, fast and accurate detection. The local information serves as context and provides rich cues to distinguish text from background noises. In addition, we designed a novel grouping algorithm on top of detected character graph as well as a text line refinement step. Text line refinement consists of a text line extension module, together with a text line filtering and regression module. Jointly they produce accurate oriented text line bounding box. Experiments show that our method achieved state-of-the-art performance in several benchmark data sets: ICDAR 2003 (IC03), ICDAR 2013 (IC13) and Street View Text (SVT).

UR - http://www.scopus.com/inward/record.url?scp=85016258874&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016258874&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-54193-8_18

DO - 10.1007/978-3-319-54193-8_18

M3 - Conference contribution

AN - SCOPUS:85016258874

SN - 9783319541921

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 280

EP - 296

BT - Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers

A2 - Nishino, Ko

A2 - Lai, Shang-Hong

A2 - Lepetit, Vincent

A2 - Sato, Yoichi

PB - Springer Verlag

ER -

He D, Yang X, Huang W, Zhou Z, Kifer D, Giles CL. Aggregating local context for accurate scene text detection. In Nishino K, Lai S-H, Lepetit V, Sato Y, editors, Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers. Springer Verlag. 2017. p. 280-296. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-54193-8_18