Improving multi-job mapreduce scheduling in an opportunistic environment

Yuting Ji, Lang Tong, Ting He, Jian Tan, Kang Won Lee, Li Zhang

Research output: Contribution to journalConference article

7 Citations (Scopus)

Abstract

As a state-of-the-art programming model for big data analytics, MapReduce is well suited for parallel processing of large data sets in opportunistic environments. Existing research on MapReduce in opportunistic environment has focused on improving single job performance, the issue of fairness that is critical in the more dominant scenario of multiple concurrent jobs remains unexplored. We address this problem by proposing an opportunistic fair scheduling algorithm, which extends the broadly adopted Fair Scheduler to an environment where nodes are intermittently available with possibly different availability patterns. The proposed scheduler maintains statistics specific to the opportunistic environment, e.g., node availability rates and pairwise availability correlations, and utilizes this information in scheduling decisions to improve fairness. Using a Hadoop-based implementation, we compare our scheduler with the current Hadoop Fair Scheduler on representative benchmarks. Our experiments verify that our scheduler can significantly reduce the variability in job completion times.

Original languageEnglish (US)
Article number6676672
Pages (from-to)9-16
Number of pages8
JournalIEEE International Conference on Cloud Computing, CLOUD
DOIs
StatePublished - Dec 1 2013
Event2013 IEEE 6th International Conference on Cloud Computing, CLOUD 2013 - Santa Clara, CA, United States
Duration: Jun 27 2013Jul 2 2013

Fingerprint

Scheduling
Availability
Scheduling algorithms
Statistics
Processing
Experiments
Big data

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Information Systems
  • Software

Cite this

@article{434deb2e78c449bfa3dc47a8ab7ea8ec,
title = "Improving multi-job mapreduce scheduling in an opportunistic environment",
abstract = "As a state-of-the-art programming model for big data analytics, MapReduce is well suited for parallel processing of large data sets in opportunistic environments. Existing research on MapReduce in opportunistic environment has focused on improving single job performance, the issue of fairness that is critical in the more dominant scenario of multiple concurrent jobs remains unexplored. We address this problem by proposing an opportunistic fair scheduling algorithm, which extends the broadly adopted Fair Scheduler to an environment where nodes are intermittently available with possibly different availability patterns. The proposed scheduler maintains statistics specific to the opportunistic environment, e.g., node availability rates and pairwise availability correlations, and utilizes this information in scheduling decisions to improve fairness. Using a Hadoop-based implementation, we compare our scheduler with the current Hadoop Fair Scheduler on representative benchmarks. Our experiments verify that our scheduler can significantly reduce the variability in job completion times.",
author = "Yuting Ji and Lang Tong and Ting He and Jian Tan and Lee, {Kang Won} and Li Zhang",
year = "2013",
month = "12",
day = "1",
doi = "10.1109/CLOUD.2013.84",
language = "English (US)",
pages = "9--16",
journal = "IEEE International Conference on Cloud Computing, CLOUD",
issn = "2159-6182",

}

Improving multi-job mapreduce scheduling in an opportunistic environment. / Ji, Yuting; Tong, Lang; He, Ting; Tan, Jian; Lee, Kang Won; Zhang, Li.

In: IEEE International Conference on Cloud Computing, CLOUD, 01.12.2013, p. 9-16.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Improving multi-job mapreduce scheduling in an opportunistic environment

AU - Ji, Yuting

AU - Tong, Lang

AU - He, Ting

AU - Tan, Jian

AU - Lee, Kang Won

AU - Zhang, Li

PY - 2013/12/1

Y1 - 2013/12/1

N2 - As a state-of-the-art programming model for big data analytics, MapReduce is well suited for parallel processing of large data sets in opportunistic environments. Existing research on MapReduce in opportunistic environment has focused on improving single job performance, the issue of fairness that is critical in the more dominant scenario of multiple concurrent jobs remains unexplored. We address this problem by proposing an opportunistic fair scheduling algorithm, which extends the broadly adopted Fair Scheduler to an environment where nodes are intermittently available with possibly different availability patterns. The proposed scheduler maintains statistics specific to the opportunistic environment, e.g., node availability rates and pairwise availability correlations, and utilizes this information in scheduling decisions to improve fairness. Using a Hadoop-based implementation, we compare our scheduler with the current Hadoop Fair Scheduler on representative benchmarks. Our experiments verify that our scheduler can significantly reduce the variability in job completion times.

AB - As a state-of-the-art programming model for big data analytics, MapReduce is well suited for parallel processing of large data sets in opportunistic environments. Existing research on MapReduce in opportunistic environment has focused on improving single job performance, the issue of fairness that is critical in the more dominant scenario of multiple concurrent jobs remains unexplored. We address this problem by proposing an opportunistic fair scheduling algorithm, which extends the broadly adopted Fair Scheduler to an environment where nodes are intermittently available with possibly different availability patterns. The proposed scheduler maintains statistics specific to the opportunistic environment, e.g., node availability rates and pairwise availability correlations, and utilizes this information in scheduling decisions to improve fairness. Using a Hadoop-based implementation, we compare our scheduler with the current Hadoop Fair Scheduler on representative benchmarks. Our experiments verify that our scheduler can significantly reduce the variability in job completion times.

UR - http://www.scopus.com/inward/record.url?scp=84897688640&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897688640&partnerID=8YFLogxK

U2 - 10.1109/CLOUD.2013.84

DO - 10.1109/CLOUD.2013.84

M3 - Conference article

AN - SCOPUS:84897688640

SP - 9

EP - 16

JO - IEEE International Conference on Cloud Computing, CLOUD

JF - IEEE International Conference on Cloud Computing, CLOUD

SN - 2159-6182

M1 - 6676672

ER -