Discovering truths from distributed data

Yaqing Wang, Fenglong Ma, Lu Su, Jing Gao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

In the big data era, the information about the same object collected from multiple sources is inevitably conflicting. The task of identifying true information (i.e., the truths) among conflicting data is referred to as truth discovery, which incorporates the estimation of source reliability degrees into the aggregation of multi-source data. However, in many real-world applications, large-scale data are distributed across multiple servers. Traditional truth discovery approaches cannot handle this scenario due to the constraints of communication overhead and privacy concern. Another limitation of most existing work is that they ignore the differences among objects, i.e., they treat all the objects equally. This limitation would be exacerbated in distributed environments where significant differences exist among the objects. To tackle the aforementioned issues, in this paper, we propose a novel distributed truth discovery framework (DTD), which can effectively and efficiently aggregate conflicting data stored across distributed servers, with the differences among the objects as well as the importance level of each server being considered. The proposed framework consists of two steps: the local truth computation step conducted by each local server and the central truth estimation step taking place in the central server. Specifically, we introduce the uncertainty values to model the differences among objects, and propose a new uncertainty-based truth discovery method (UbTD) for calculating the true information of objects in each local server. The outputs of the local truth computation step include the estimated local truths and the variances of objects, which are the input information of the central truth estimation step. To infer the final true information in the central server, we propose a new algorithm to aggregate the outputs of all the local servers with the quality of different local servers taken into account. The proposed distributed truth discovery framework can infer object truths without delivering any raw data to the central server, and thus can reduce communication overhead as well as preserve data privacy. Experimental results on three real world datasets show that the proposed DTD framework can efficiently estimate object truths with accuracy guarantee, and the proposed UbTD algorithm significantly outperforms the state-of-the-art batch truth discovery approaches.

Original languageEnglish (US)
Title of host publicationProceedings - 17th IEEE International Conference on Data Mining, ICDM 2017
EditorsGeorge Karypis, Srinivas Alu, Vijay Raghavan, Xindong Wu, Lucio Miele
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages505-514
Number of pages10
ISBN (Electronic)9781538638347
DOIs
StatePublished - Dec 15 2017
Event17th IEEE International Conference on Data Mining, ICDM 2017 - New Orleans, United States
Duration: Nov 18 2017Nov 21 2017

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2017-November
ISSN (Print)1550-4786

Other

Other17th IEEE International Conference on Data Mining, ICDM 2017
CountryUnited States
CityNew Orleans
Period11/18/1711/21/17

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Wang, Y., Ma, F., Su, L., & Gao, J. (2017). Discovering truths from distributed data. In G. Karypis, S. Alu, V. Raghavan, X. Wu, & L. Miele (Eds.), Proceedings - 17th IEEE International Conference on Data Mining, ICDM 2017 (pp. 505-514). (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2017-November). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2017.60