HESDK: A Hybrid Approach to Extracting Scientific Domain Knowledge Entities

Jian Wu, Sagnik Ray Choudhury, Agnese Chiatti, Chen Liang, Clyde Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

We investigate a variant of the problem of automatic keyphrase extraction from scientific documents, which we define as Scientific Domain Knowledge Entity (SDKE) extraction. Keyphrases are noun phrases important to the documents themselves. In contrast, an SDKE is text that refers to a concept and can be classified as a process, material, task, dataset etc. A SDKE represents domain knowledge, but is not necessarily important to the document it is in. Supervised keyphrase extraction algorithms using non-sequential classifiers and global measures of informativeness (PMI, tf-idf) have been used for this task. Another approach is to use sequential labeling algorithms with local context from a sentence, as done in the named entity recognition. We show that these two methods can complement each other and a simple merging can improve the extraction accuracy by 5-7 percentiles. We further propose several heuristics to improve the extraction accuracy. Our preliminary experiments suggest that it is possible to improve the accuracy of the sequential learner itself by utilizing the predictions of the non-sequential model.

Original languageEnglish (US)
Title of host publication2017 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538638613
DOIs
StatePublished - Jul 25 2017
Event17th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017 - Toronto, Canada
Duration: Jun 19 2017Jun 23 2017

Other

Other17th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017
CountryCanada
CityToronto
Period6/19/176/23/17

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Fingerprint Dive into the research topics of 'HESDK: A Hybrid Approach to Extracting Scientific Domain Knowledge Entities'. Together they form a unique fingerprint.

  • Cite this

    Wu, J., Choudhury, S. R., Chiatti, A., Liang, C., & Giles, C. L. (2017). HESDK: A Hybrid Approach to Extracting Scientific Domain Knowledge Entities. In 2017 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017 [7991580] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/JCDL.2017.7991580