CrowdK: Answering top-k queries with crowdsourcing

Jongwuk Lee, Dongwon Lee, Seung won Hwang

Research output: Contribution to journalArticle

9 Scopus citations

Abstract

In recent years, crowdsourcing has emerged as a new computing paradigm for bridging the gap between human- and machine-based computation. As one of the core operations in data retrieval, we study top-k queries with crowdsourcing, namely crowd-enabled top-k queries. This problem is formulated with three key factors, latency, monetary cost, and quality of answers. We first aim to design a novel framework that minimizes monetary cost when latency is constrained. Toward this goal, we employ a two-phase parameterized framework with two parameters, called buckets and ranges. On top of this framework, we develop three methods: greedy, equi-sized, and dynamic programming, to determine the buckets and ranges. By combining the three methods at each phase, we propose four algorithms: GdyBucket, EquiBucket, EquiRange, and CrowdK. When the crowd answers are imprecise, we also address improving the accuracy of the top-k answers. Lastly, using both simulated crowds and real crowds at Amazon Mechanical Turk, we evaluate the trade-off between our proposals with respect to monetary cost, accuracy of answers, and running time. Compared to other competitive algorithms, it is found that CrowdK reduces monetary cost up to 20 times, without sacrificing the accuracy of the top-k answers.

Original languageEnglish (US)
Pages (from-to)98-120
Number of pages23
JournalInformation Sciences
Volume399
DOIs
StatePublished - Aug 1 2017

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this