TY - GEN
T1 - Collapsing corporate confusion
T2 - 5th IEEE International Conference on Big Data, Big Data 2017
AU - Marple, Tim
AU - Desmarais, Bruce
AU - Young, Kevin L.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - In this paper, we introduce a novel battery of classifiers to resolve artificial inconsistencies among entity names within large datasets. Using data on the corporate sector, we describe the logic underlying a relational approach to entity resolution, and its importance for data acquisition, feature extraction, and integration. We subsequently leverage the relational structure of BoardEx employment data to assess the efficacy of these methods as compared to a ground-truth sample of coded name inconsistencies. We show that these methods hold significant promise for cleaning artificial distinctions in entity names via enrichment from integration with external data, and further demonstrate the effect of such resolution on the accuracy of extracted network topology features. We conclude with implications for existing findings and steps for future work.
AB - In this paper, we introduce a novel battery of classifiers to resolve artificial inconsistencies among entity names within large datasets. Using data on the corporate sector, we describe the logic underlying a relational approach to entity resolution, and its importance for data acquisition, feature extraction, and integration. We subsequently leverage the relational structure of BoardEx employment data to assess the efficacy of these methods as compared to a ground-truth sample of coded name inconsistencies. We show that these methods hold significant promise for cleaning artificial distinctions in entity names via enrichment from integration with external data, and further demonstrate the effect of such resolution on the accuracy of extracted network topology features. We conclude with implications for existing findings and steps for future work.
UR - http://www.scopus.com/inward/record.url?scp=85047731441&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047731441&partnerID=8YFLogxK
U2 - 10.1109/BigData.2017.8258224
DO - 10.1109/BigData.2017.8258224
M3 - Conference contribution
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 2637
EP - 2643
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 December 2017 through 14 December 2017
ER -