Crowdsourcing high quality labels with a tight budget

Qi Li, Fenglong Ma, Jing Gao, Lu Su, Christopher J. Quinn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Citations (Scopus)

Abstract

In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled instances and the quality of the final results. Existing budget allocation methods can achieve good quantity but cannot guarantee high quality of individual instances under a tight budget. However, in some scenarios, requesters may be willing to label fewer instances but of higher quality. Moreover, they may have different requirements on quality for different tasks. To address these challenges, we propose a flexible budget allocation framework called Requallo. Requallo allows requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The budget allocation problem is modeled as a Markov decision process and a sequential labeling policy is produced. The proposed policy greedily searches for the instance to query next as the one that can provide the maximum reward for the goal. The Requallo framework is further extended to consider worker reliability so that the budget can be better allocated. Experiments on two real-world crowdsourcing tasks as well as a simulated task demonstrate that when the budget is tight, the proposed Requallo framework outperforms existing state-of-the-art budget allocation methods from both quantity and quality aspects.

Original languageEnglish (US)
Title of host publicationWSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery, Inc
Pages237-246
Number of pages10
ISBN (Electronic)9781450337168
DOIs
StatePublished - Feb 8 2016
Event9th ACM International Conference on Web Search and Data Mining, WSDM 2016 - San Francisco, United States
Duration: Feb 22 2016Feb 25 2016

Publication series

NameWSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining

Other

Other9th ACM International Conference on Web Search and Data Mining, WSDM 2016
CountryUnited States
CitySan Francisco
Period2/22/162/25/16

Fingerprint

Labeling
Labels
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software
  • Computer Networks and Communications

Cite this

Li, Q., Ma, F., Gao, J., Su, L., & Quinn, C. J. (2016). Crowdsourcing high quality labels with a tight budget. In WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining (pp. 237-246). (WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining). Association for Computing Machinery, Inc. https://doi.org/10.1145/2835776.2835797
Li, Qi ; Ma, Fenglong ; Gao, Jing ; Su, Lu ; Quinn, Christopher J. / Crowdsourcing high quality labels with a tight budget. WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc, 2016. pp. 237-246 (WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining).
@inproceedings{1d02e2dd7f54461bab1f8ad701cf950a,
title = "Crowdsourcing high quality labels with a tight budget",
abstract = "In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled instances and the quality of the final results. Existing budget allocation methods can achieve good quantity but cannot guarantee high quality of individual instances under a tight budget. However, in some scenarios, requesters may be willing to label fewer instances but of higher quality. Moreover, they may have different requirements on quality for different tasks. To address these challenges, we propose a flexible budget allocation framework called Requallo. Requallo allows requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The budget allocation problem is modeled as a Markov decision process and a sequential labeling policy is produced. The proposed policy greedily searches for the instance to query next as the one that can provide the maximum reward for the goal. The Requallo framework is further extended to consider worker reliability so that the budget can be better allocated. Experiments on two real-world crowdsourcing tasks as well as a simulated task demonstrate that when the budget is tight, the proposed Requallo framework outperforms existing state-of-the-art budget allocation methods from both quantity and quality aspects.",
author = "Qi Li and Fenglong Ma and Jing Gao and Lu Su and Quinn, {Christopher J.}",
year = "2016",
month = "2",
day = "8",
doi = "10.1145/2835776.2835797",
language = "English (US)",
series = "WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining",
publisher = "Association for Computing Machinery, Inc",
pages = "237--246",
booktitle = "WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining",

}

Li, Q, Ma, F, Gao, J, Su, L & Quinn, CJ 2016, Crowdsourcing high quality labels with a tight budget. in WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining. WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining, Association for Computing Machinery, Inc, pp. 237-246, 9th ACM International Conference on Web Search and Data Mining, WSDM 2016, San Francisco, United States, 2/22/16. https://doi.org/10.1145/2835776.2835797

Crowdsourcing high quality labels with a tight budget. / Li, Qi; Ma, Fenglong; Gao, Jing; Su, Lu; Quinn, Christopher J.

WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc, 2016. p. 237-246 (WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Crowdsourcing high quality labels with a tight budget

AU - Li, Qi

AU - Ma, Fenglong

AU - Gao, Jing

AU - Su, Lu

AU - Quinn, Christopher J.

PY - 2016/2/8

Y1 - 2016/2/8

N2 - In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled instances and the quality of the final results. Existing budget allocation methods can achieve good quantity but cannot guarantee high quality of individual instances under a tight budget. However, in some scenarios, requesters may be willing to label fewer instances but of higher quality. Moreover, they may have different requirements on quality for different tasks. To address these challenges, we propose a flexible budget allocation framework called Requallo. Requallo allows requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The budget allocation problem is modeled as a Markov decision process and a sequential labeling policy is produced. The proposed policy greedily searches for the instance to query next as the one that can provide the maximum reward for the goal. The Requallo framework is further extended to consider worker reliability so that the budget can be better allocated. Experiments on two real-world crowdsourcing tasks as well as a simulated task demonstrate that when the budget is tight, the proposed Requallo framework outperforms existing state-of-the-art budget allocation methods from both quantity and quality aspects.

AB - In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled instances and the quality of the final results. Existing budget allocation methods can achieve good quantity but cannot guarantee high quality of individual instances under a tight budget. However, in some scenarios, requesters may be willing to label fewer instances but of higher quality. Moreover, they may have different requirements on quality for different tasks. To address these challenges, we propose a flexible budget allocation framework called Requallo. Requallo allows requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The budget allocation problem is modeled as a Markov decision process and a sequential labeling policy is produced. The proposed policy greedily searches for the instance to query next as the one that can provide the maximum reward for the goal. The Requallo framework is further extended to consider worker reliability so that the budget can be better allocated. Experiments on two real-world crowdsourcing tasks as well as a simulated task demonstrate that when the budget is tight, the proposed Requallo framework outperforms existing state-of-the-art budget allocation methods from both quantity and quality aspects.

UR - http://www.scopus.com/inward/record.url?scp=84964378377&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964378377&partnerID=8YFLogxK

U2 - 10.1145/2835776.2835797

DO - 10.1145/2835776.2835797

M3 - Conference contribution

AN - SCOPUS:84964378377

T3 - WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining

SP - 237

EP - 246

BT - WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining

PB - Association for Computing Machinery, Inc

ER -

Li Q, Ma F, Gao J, Su L, Quinn CJ. Crowdsourcing high quality labels with a tight budget. In WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc. 2016. p. 237-246. (WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining). https://doi.org/10.1145/2835776.2835797