FaitCrowd: Fine grained truth discovery for crowdsourced data aggregation

Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

73 Citations (Scopus)

Abstract

In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. FaitCrowd jointly models the process of generating question content and sources' provided answers in a probabilistic model to estimate both topical expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, FaitCrowd demonstrates better ability to obtain true answers for the questions compared with existing approaches. Experimental results on two real-world datasets show that FaitCrowd can significantly reduce the error rate of aggregation compared with the state-of-the-art multi-source aggregation approaches due to its ability of learning topical expertise from question content and collected answers.

Original languageEnglish (US)
Title of host publicationKDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages745-754
Number of pages10
ISBN (Electronic)9781450336642
DOIs
StatePublished - Aug 10 2015
Event21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, Australia
Duration: Aug 10 2015Aug 13 2015

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2015-August

Other

Other21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
CountryAustralia
CitySydney
Period8/10/158/13/15

Fingerprint

Agglomeration

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Ma, F., Li, Y., Li, Q., Qiu, M., Gao, J., Zhi, S., ... Han, J. (2015). FaitCrowd: Fine grained truth discovery for crowdsourced data aggregation. In KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 745-754). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 2015-August). Association for Computing Machinery. https://doi.org/10.1145/2783258.2783314
Ma, Fenglong ; Li, Yaliang ; Li, Qi ; Qiu, Minghui ; Gao, Jing ; Zhi, Shi ; Su, Lu ; Zhao, Bo ; Ji, Heng ; Han, Jiawei. / FaitCrowd : Fine grained truth discovery for crowdsourced data aggregation. KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2015. pp. 745-754 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{78a3140da5014da5a6ed1fba59e8a0d0,
title = "FaitCrowd: Fine grained truth discovery for crowdsourced data aggregation",
abstract = "In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. FaitCrowd jointly models the process of generating question content and sources' provided answers in a probabilistic model to estimate both topical expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, FaitCrowd demonstrates better ability to obtain true answers for the questions compared with existing approaches. Experimental results on two real-world datasets show that FaitCrowd can significantly reduce the error rate of aggregation compared with the state-of-the-art multi-source aggregation approaches due to its ability of learning topical expertise from question content and collected answers.",
author = "Fenglong Ma and Yaliang Li and Qi Li and Minghui Qiu and Jing Gao and Shi Zhi and Lu Su and Bo Zhao and Heng Ji and Jiawei Han",
year = "2015",
month = "8",
day = "10",
doi = "10.1145/2783258.2783314",
language = "English (US)",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
pages = "745--754",
booktitle = "KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining",

}

Ma, F, Li, Y, Li, Q, Qiu, M, Gao, J, Zhi, S, Su, L, Zhao, B, Ji, H & Han, J 2015, FaitCrowd: Fine grained truth discovery for crowdsourced data aggregation. in KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 2015-August, Association for Computing Machinery, pp. 745-754, 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015, Sydney, Australia, 8/10/15. https://doi.org/10.1145/2783258.2783314

FaitCrowd : Fine grained truth discovery for crowdsourced data aggregation. / Ma, Fenglong; Li, Yaliang; Li, Qi; Qiu, Minghui; Gao, Jing; Zhi, Shi; Su, Lu; Zhao, Bo; Ji, Heng; Han, Jiawei.

KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2015. p. 745-754 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 2015-August).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - FaitCrowd

T2 - Fine grained truth discovery for crowdsourced data aggregation

AU - Ma, Fenglong

AU - Li, Yaliang

AU - Li, Qi

AU - Qiu, Minghui

AU - Gao, Jing

AU - Zhi, Shi

AU - Su, Lu

AU - Zhao, Bo

AU - Ji, Heng

AU - Han, Jiawei

PY - 2015/8/10

Y1 - 2015/8/10

N2 - In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. FaitCrowd jointly models the process of generating question content and sources' provided answers in a probabilistic model to estimate both topical expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, FaitCrowd demonstrates better ability to obtain true answers for the questions compared with existing approaches. Experimental results on two real-world datasets show that FaitCrowd can significantly reduce the error rate of aggregation compared with the state-of-the-art multi-source aggregation approaches due to its ability of learning topical expertise from question content and collected answers.

AB - In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. FaitCrowd jointly models the process of generating question content and sources' provided answers in a probabilistic model to estimate both topical expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, FaitCrowd demonstrates better ability to obtain true answers for the questions compared with existing approaches. Experimental results on two real-world datasets show that FaitCrowd can significantly reduce the error rate of aggregation compared with the state-of-the-art multi-source aggregation approaches due to its ability of learning topical expertise from question content and collected answers.

UR - http://www.scopus.com/inward/record.url?scp=84954098959&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84954098959&partnerID=8YFLogxK

U2 - 10.1145/2783258.2783314

DO - 10.1145/2783258.2783314

M3 - Conference contribution

AN - SCOPUS:84954098959

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 745

EP - 754

BT - KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -

Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S et al. FaitCrowd: Fine grained truth discovery for crowdsourced data aggregation. In KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2015. p. 745-754. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2783258.2783314