Automatic knowledge base construction from scholarly documents

Rabah A. Al-Zaidy, Clyde Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The continuing growth of published scholarly content on the web ensures the availability of the most recent scient findings to researchers. Scholarly documents, such as research articles, are easily accessed by using academic search engines that are built on large repositories of scholarly documents. Scienti.c information extraction from documents into a structured knowledge graph representation facilitates automated machine understanding of a document's content. Traditional information extraction approaches, that either require training samples or a preexisting knowledge base to assist in the extraction, can be challenging when applied to large repositories of digital documents. Labeled training examples for such large scale are diicult to obtain for such datasets. Also, most available knowledge bases are built from web data and do not have suicient coverage to include concepts found in scienti.c articles. In this paper we aim to construct a knowledge graph from scholarly documents while addressing both these issues. We propose a fully automatic, unsupervised system for scienti.c information extraction that does not build on an existing knowledge base and avoids manually-tagged training data. We describe and evaluate a constructed taxonomy that contains over 15k entities resulting from applying our approach to 10k documents.

Original languageEnglish (US)
Title of host publicationDocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering
PublisherAssociation for Computing Machinery, Inc
Pages149-152
Number of pages4
ISBN (Electronic)9781450346894
DOIs
StatePublished - Aug 31 2017
Event17th ACM Symposium on Document Engineering, DocEng 2017 - Valletta, Malta
Duration: Sep 4 2017Sep 7 2017

Other

Other17th ACM Symposium on Document Engineering, DocEng 2017
CountryMalta
CityValletta
Period9/4/179/7/17

Fingerprint

Taxonomies
Search engines
Availability

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Science Applications

Cite this

Al-Zaidy, R. A., & Giles, C. L. (2017). Automatic knowledge base construction from scholarly documents. In DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering (pp. 149-152). Association for Computing Machinery, Inc. https://doi.org/10.1145/3103010.3121043
Al-Zaidy, Rabah A. ; Giles, Clyde Lee. / Automatic knowledge base construction from scholarly documents. DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering. Association for Computing Machinery, Inc, 2017. pp. 149-152
@inproceedings{ce54073a9ab1422fac4b2396b4d618ce,
title = "Automatic knowledge base construction from scholarly documents",
abstract = "The continuing growth of published scholarly content on the web ensures the availability of the most recent scient findings to researchers. Scholarly documents, such as research articles, are easily accessed by using academic search engines that are built on large repositories of scholarly documents. Scienti.c information extraction from documents into a structured knowledge graph representation facilitates automated machine understanding of a document's content. Traditional information extraction approaches, that either require training samples or a preexisting knowledge base to assist in the extraction, can be challenging when applied to large repositories of digital documents. Labeled training examples for such large scale are diicult to obtain for such datasets. Also, most available knowledge bases are built from web data and do not have suicient coverage to include concepts found in scienti.c articles. In this paper we aim to construct a knowledge graph from scholarly documents while addressing both these issues. We propose a fully automatic, unsupervised system for scienti.c information extraction that does not build on an existing knowledge base and avoids manually-tagged training data. We describe and evaluate a constructed taxonomy that contains over 15k entities resulting from applying our approach to 10k documents.",
author = "Al-Zaidy, {Rabah A.} and Giles, {Clyde Lee}",
year = "2017",
month = "8",
day = "31",
doi = "10.1145/3103010.3121043",
language = "English (US)",
pages = "149--152",
booktitle = "DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering",
publisher = "Association for Computing Machinery, Inc",

}

Al-Zaidy, RA & Giles, CL 2017, Automatic knowledge base construction from scholarly documents. in DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering. Association for Computing Machinery, Inc, pp. 149-152, 17th ACM Symposium on Document Engineering, DocEng 2017, Valletta, Malta, 9/4/17. https://doi.org/10.1145/3103010.3121043

Automatic knowledge base construction from scholarly documents. / Al-Zaidy, Rabah A.; Giles, Clyde Lee.

DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering. Association for Computing Machinery, Inc, 2017. p. 149-152.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic knowledge base construction from scholarly documents

AU - Al-Zaidy, Rabah A.

AU - Giles, Clyde Lee

PY - 2017/8/31

Y1 - 2017/8/31

N2 - The continuing growth of published scholarly content on the web ensures the availability of the most recent scient findings to researchers. Scholarly documents, such as research articles, are easily accessed by using academic search engines that are built on large repositories of scholarly documents. Scienti.c information extraction from documents into a structured knowledge graph representation facilitates automated machine understanding of a document's content. Traditional information extraction approaches, that either require training samples or a preexisting knowledge base to assist in the extraction, can be challenging when applied to large repositories of digital documents. Labeled training examples for such large scale are diicult to obtain for such datasets. Also, most available knowledge bases are built from web data and do not have suicient coverage to include concepts found in scienti.c articles. In this paper we aim to construct a knowledge graph from scholarly documents while addressing both these issues. We propose a fully automatic, unsupervised system for scienti.c information extraction that does not build on an existing knowledge base and avoids manually-tagged training data. We describe and evaluate a constructed taxonomy that contains over 15k entities resulting from applying our approach to 10k documents.

AB - The continuing growth of published scholarly content on the web ensures the availability of the most recent scient findings to researchers. Scholarly documents, such as research articles, are easily accessed by using academic search engines that are built on large repositories of scholarly documents. Scienti.c information extraction from documents into a structured knowledge graph representation facilitates automated machine understanding of a document's content. Traditional information extraction approaches, that either require training samples or a preexisting knowledge base to assist in the extraction, can be challenging when applied to large repositories of digital documents. Labeled training examples for such large scale are diicult to obtain for such datasets. Also, most available knowledge bases are built from web data and do not have suicient coverage to include concepts found in scienti.c articles. In this paper we aim to construct a knowledge graph from scholarly documents while addressing both these issues. We propose a fully automatic, unsupervised system for scienti.c information extraction that does not build on an existing knowledge base and avoids manually-tagged training data. We describe and evaluate a constructed taxonomy that contains over 15k entities resulting from applying our approach to 10k documents.

UR - http://www.scopus.com/inward/record.url?scp=85030555381&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030555381&partnerID=8YFLogxK

U2 - 10.1145/3103010.3121043

DO - 10.1145/3103010.3121043

M3 - Conference contribution

AN - SCOPUS:85030555381

SP - 149

EP - 152

BT - DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering

PB - Association for Computing Machinery, Inc

ER -

Al-Zaidy RA, Giles CL. Automatic knowledge base construction from scholarly documents. In DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering. Association for Computing Machinery, Inc. 2017. p. 149-152 https://doi.org/10.1145/3103010.3121043