Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection

Dafang He, Scott Cohen, Brian Price, Daniel Kifer, Clyde Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-The-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-Task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level 'edges' around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. %, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.

Original languageEnglish (US)
Title of host publicationProceedings - 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017
PublisherIEEE Computer Society
Pages254-261
Number of pages8
Volume1
ISBN (Electronic)9781538635865
DOIs
StatePublished - Jan 25 2018
Event14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017 - Kyoto, Japan
Duration: Nov 9 2017Nov 15 2017

Other

Other14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017
CountryJapan
CityKyoto
Period11/9/1711/15/17

Fingerprint

Semantics
Neural networks
Heuristic methods
Pixels

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Cite this

He, D., Cohen, S., Price, B., Kifer, D., & Giles, C. L. (2018). Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection. In Proceedings - 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017 (Vol. 1, pp. 254-261). IEEE Computer Society. https://doi.org/10.1109/ICDAR.2017.50
He, Dafang ; Cohen, Scott ; Price, Brian ; Kifer, Daniel ; Giles, Clyde Lee. / Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection. Proceedings - 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017. Vol. 1 IEEE Computer Society, 2018. pp. 254-261
@inproceedings{c8fabedd5ada4ea3adcee62ece546f0f,
title = "Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection",
abstract = "Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-The-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-Task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level 'edges' around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. {\%}, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.",
author = "Dafang He and Scott Cohen and Brian Price and Daniel Kifer and Giles, {Clyde Lee}",
year = "2018",
month = "1",
day = "25",
doi = "10.1109/ICDAR.2017.50",
language = "English (US)",
volume = "1",
pages = "254--261",
booktitle = "Proceedings - 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017",
publisher = "IEEE Computer Society",
address = "United States",

}

He, D, Cohen, S, Price, B, Kifer, D & Giles, CL 2018, Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection. in Proceedings - 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017. vol. 1, IEEE Computer Society, pp. 254-261, 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, 11/9/17. https://doi.org/10.1109/ICDAR.2017.50

Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection. / He, Dafang; Cohen, Scott; Price, Brian; Kifer, Daniel; Giles, Clyde Lee.

Proceedings - 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017. Vol. 1 IEEE Computer Society, 2018. p. 254-261.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection

AU - He, Dafang

AU - Cohen, Scott

AU - Price, Brian

AU - Kifer, Daniel

AU - Giles, Clyde Lee

PY - 2018/1/25

Y1 - 2018/1/25

N2 - Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-The-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-Task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level 'edges' around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. %, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.

AB - Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-The-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-Task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level 'edges' around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. %, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.

UR - http://www.scopus.com/inward/record.url?scp=85045181942&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045181942&partnerID=8YFLogxK

U2 - 10.1109/ICDAR.2017.50

DO - 10.1109/ICDAR.2017.50

M3 - Conference contribution

AN - SCOPUS:85045181942

VL - 1

SP - 254

EP - 261

BT - Proceedings - 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017

PB - IEEE Computer Society

ER -

He D, Cohen S, Price B, Kifer D, Giles CL. Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection. In Proceedings - 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017. Vol. 1. IEEE Computer Society. 2018. p. 254-261 https://doi.org/10.1109/ICDAR.2017.50