Extracting Semantic Relations for Scholarly Knowledge Base Construction

Rabah A. Al-Zaidy, Clyde Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations


The problem of information extraction from scientific articles, found as PDF documents in large digital repositories, is gaining more attention as the amount of research findings continues to grow. We propose a system to extract semantic relations among entities in scholarly articles by making use of external syntactic patterns and an iterative learner. While information extraction from scholarly documents have been studied before, it has been focused mainly on the abstract and keywords. Our method extracts semantic entities as concepts and instances along with their attributes from the fully body text of documents. We extract two types of relationships between concepts in the text using an iterative learning algorithm. External data sources from the web such as the Microsoft concept graph, as well as query logs, are utilized to evaluate the quality of the extracted concepts and relations. The concepts are used to construct a scientific taxonomy covering the research content of the documents. To evaluate the system we apply our approach on a set of 10k scholarly documents and conduct several evaluations to show the effectiveness of the proposed methods. We show that our system obtains a 23% improvement in precision over existing web IE tools when they are applied to scholarly documents.

Original languageEnglish (US)
Title of host publicationProceedings - 12th IEEE International Conference on Semantic Computing, ICSC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages8
ISBN (Electronic)9781538644072
StatePublished - Apr 9 2018
Event12th IEEE International Conference on Semantic Computing, ICSC 2018 - Laguna Hills, United States
Duration: Jan 31 2018Feb 2 2018


Other12th IEEE International Conference on Semantic Computing, ICSC 2018
Country/TerritoryUnited States
CityLaguna Hills

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Human-Computer Interaction
  • Information Systems and Management


Dive into the research topics of 'Extracting Semantic Relations for Scholarly Knowledge Base Construction'. Together they form a unique fingerprint.

Cite this