CrowdK: Answering top-k queries with crowdsourcing

Jongwuk Lee, Dongwon Lee, Seung won Hwang

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

In recent years, crowdsourcing has emerged as a new computing paradigm for bridging the gap between human- and machine-based computation. As one of the core operations in data retrieval, we study top-k queries with crowdsourcing, namely crowd-enabled top-k queries. This problem is formulated with three key factors, latency, monetary cost, and quality of answers. We first aim to design a novel framework that minimizes monetary cost when latency is constrained. Toward this goal, we employ a two-phase parameterized framework with two parameters, called buckets and ranges. On top of this framework, we develop three methods: greedy, equi-sized, and dynamic programming, to determine the buckets and ranges. By combining the three methods at each phase, we propose four algorithms: GdyBucket, EquiBucket, EquiRange, and CrowdK. When the crowd answers are imprecise, we also address improving the accuracy of the top-k answers. Lastly, using both simulated crowds and real crowds at Amazon Mechanical Turk, we evaluate the trade-off between our proposals with respect to monetary cost, accuracy of answers, and running time. Compared to other competitive algorithms, it is found that CrowdK reduces monetary cost up to 20 times, without sacrificing the accuracy of the top-k answers.

Original languageEnglish (US)
Pages (from-to)98-120
Number of pages23
JournalInformation Sciences
Volume399
DOIs
StatePublished - Aug 1 2017

Fingerprint

Query
Costs
Latency
Dynamic programming
Range of data
Dynamic Programming
Two Parameters
Retrieval
Trade-offs
Paradigm
Top-k
Minimise
Computing
Evaluate
Framework
Amazon
Factors

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

Lee, Jongwuk ; Lee, Dongwon ; Hwang, Seung won. / CrowdK : Answering top-k queries with crowdsourcing. In: Information Sciences. 2017 ; Vol. 399. pp. 98-120.
@article{065d8b89171f4f80910f0993786360a4,
title = "CrowdK: Answering top-k queries with crowdsourcing",
abstract = "In recent years, crowdsourcing has emerged as a new computing paradigm for bridging the gap between human- and machine-based computation. As one of the core operations in data retrieval, we study top-k queries with crowdsourcing, namely crowd-enabled top-k queries. This problem is formulated with three key factors, latency, monetary cost, and quality of answers. We first aim to design a novel framework that minimizes monetary cost when latency is constrained. Toward this goal, we employ a two-phase parameterized framework with two parameters, called buckets and ranges. On top of this framework, we develop three methods: greedy, equi-sized, and dynamic programming, to determine the buckets and ranges. By combining the three methods at each phase, we propose four algorithms: GdyBucket, EquiBucket, EquiRange, and CrowdK. When the crowd answers are imprecise, we also address improving the accuracy of the top-k answers. Lastly, using both simulated crowds and real crowds at Amazon Mechanical Turk, we evaluate the trade-off between our proposals with respect to monetary cost, accuracy of answers, and running time. Compared to other competitive algorithms, it is found that CrowdK reduces monetary cost up to 20 times, without sacrificing the accuracy of the top-k answers.",
author = "Jongwuk Lee and Dongwon Lee and Hwang, {Seung won}",
year = "2017",
month = "8",
day = "1",
doi = "10.1016/j.ins.2017.03.010",
language = "English (US)",
volume = "399",
pages = "98--120",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

CrowdK : Answering top-k queries with crowdsourcing. / Lee, Jongwuk; Lee, Dongwon; Hwang, Seung won.

In: Information Sciences, Vol. 399, 01.08.2017, p. 98-120.

Research output: Contribution to journalArticle

TY - JOUR

T1 - CrowdK

T2 - Answering top-k queries with crowdsourcing

AU - Lee, Jongwuk

AU - Lee, Dongwon

AU - Hwang, Seung won

PY - 2017/8/1

Y1 - 2017/8/1

N2 - In recent years, crowdsourcing has emerged as a new computing paradigm for bridging the gap between human- and machine-based computation. As one of the core operations in data retrieval, we study top-k queries with crowdsourcing, namely crowd-enabled top-k queries. This problem is formulated with three key factors, latency, monetary cost, and quality of answers. We first aim to design a novel framework that minimizes monetary cost when latency is constrained. Toward this goal, we employ a two-phase parameterized framework with two parameters, called buckets and ranges. On top of this framework, we develop three methods: greedy, equi-sized, and dynamic programming, to determine the buckets and ranges. By combining the three methods at each phase, we propose four algorithms: GdyBucket, EquiBucket, EquiRange, and CrowdK. When the crowd answers are imprecise, we also address improving the accuracy of the top-k answers. Lastly, using both simulated crowds and real crowds at Amazon Mechanical Turk, we evaluate the trade-off between our proposals with respect to monetary cost, accuracy of answers, and running time. Compared to other competitive algorithms, it is found that CrowdK reduces monetary cost up to 20 times, without sacrificing the accuracy of the top-k answers.

AB - In recent years, crowdsourcing has emerged as a new computing paradigm for bridging the gap between human- and machine-based computation. As one of the core operations in data retrieval, we study top-k queries with crowdsourcing, namely crowd-enabled top-k queries. This problem is formulated with three key factors, latency, monetary cost, and quality of answers. We first aim to design a novel framework that minimizes monetary cost when latency is constrained. Toward this goal, we employ a two-phase parameterized framework with two parameters, called buckets and ranges. On top of this framework, we develop three methods: greedy, equi-sized, and dynamic programming, to determine the buckets and ranges. By combining the three methods at each phase, we propose four algorithms: GdyBucket, EquiBucket, EquiRange, and CrowdK. When the crowd answers are imprecise, we also address improving the accuracy of the top-k answers. Lastly, using both simulated crowds and real crowds at Amazon Mechanical Turk, we evaluate the trade-off between our proposals with respect to monetary cost, accuracy of answers, and running time. Compared to other competitive algorithms, it is found that CrowdK reduces monetary cost up to 20 times, without sacrificing the accuracy of the top-k answers.

UR - http://www.scopus.com/inward/record.url?scp=85015087517&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015087517&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2017.03.010

DO - 10.1016/j.ins.2017.03.010

M3 - Article

AN - SCOPUS:85015087517

VL - 399

SP - 98

EP - 120

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -