Detecting outliers in data with correlated measures

Yu Hsuan Kuo, Zhenhui Li, Daniel Kifer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.

Original languageEnglish (US)
Title of host publicationCIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
EditorsNorman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster
PublisherAssociation for Computing Machinery
Pages287-296
Number of pages10
ISBN (Electronic)9781450360142
DOIs
StatePublished - Oct 17 2018
Event27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy
Duration: Oct 22 2018Oct 26 2018

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other27th ACM International Conference on Information and Knowledge Management, CIKM 2018
CountryItaly
CityTorino
Period10/22/1810/26/18

All Science Journal Classification (ASJC) codes

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Fingerprint Dive into the research topics of 'Detecting outliers in data with correlated measures'. Together they form a unique fingerprint.

  • Cite this

    Kuo, Y. H., Li, Z., & Kifer, D. (2018). Detecting outliers in data with correlated measures. In N. Paton, S. Candan, H. Wang, J. Allan, R. Agrawal, A. Labrinidis, A. Cuzzocrea, M. Zaki, D. Srivastava, A. Broder, & A. Schuster (Eds.), CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 287-296). (International Conference on Information and Knowledge Management, Proceedings). Association for Computing Machinery. https://doi.org/10.1145/3269206.3271798