Collapsing corporate confusion: Leveraging network structures for effective entity resolution in relational corporate data

Tim Marple, Bruce Desmarais, Kevin L. Young

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

In this paper, we introduce a novel battery of classifiers to resolve artificial inconsistencies among entity names within large datasets. Using data on the corporate sector, we describe the logic underlying a relational approach to entity resolution, and its importance for data acquisition, feature extraction, and integration. We subsequently leverage the relational structure of BoardEx employment data to assess the efficacy of these methods as compared to a ground-truth sample of coded name inconsistencies. We show that these methods hold significant promise for cleaning artificial distinctions in entity names via enrichment from integration with external data, and further demonstrate the effect of such resolution on the accuracy of extracted network topology features. We conclude with implications for existing findings and steps for future work.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2637-2643
Number of pages7
ISBN (Electronic)9781538627143
DOIs
StatePublished - Jul 1 2017
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: Dec 11 2017Dec 14 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Other

Other5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period12/11/1712/14/17

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Collapsing corporate confusion: Leveraging network structures for effective entity resolution in relational corporate data'. Together they form a unique fingerprint.

Cite this